Skip to content

ICE with non-utf8 const generic &str #75763

Closed
@DutchGhost

Description

@DutchGhost
Contributor

Code

#![feature(const_generics)]

struct Bug<const S: &'static str>;
    
fn main() {
    let b: Bug::<{
        unsafe {
            std::mem::transmute::<&[u8], &str>(&[0xC0, 0xC1, 0xF5])
        }
    }>;
}

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (e15510ca3 2020-08-20)

Error output

thread 'rustc' panicked at 'non utf8 str from miri: Utf8Error { valid_up_to: 0, error_len: Some(1) }
Backtrace

stack backtrace:
   0: rust_begin_unwind
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/std/src/panicking.rs:475
   1: core::panicking::panic_fmt
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/panicking.rs:85
   2: core::option::expect_none_failed
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/option.rs:1221
   3: rustc_middle::ty::print::pretty::PrettyPrinter::pretty_print_const
   4: rustc_middle::ty::print::pretty::<impl core::fmt::Display for &rustc_middle::ty::consts::Const>::fmt
   5: core::fmt::write
             at /rustc/e15510ca33ea15c893b78710101c962b11459963/library/core/src/fmt/mod.rs:1117
   6: rustc_middle::ty::print::obsolete::DefPathBasedNames::push_generic_params
   7: rustc_codegen_llvm::type_of::uncached_llvm_type
   8: <rustc_target::abi::TyAndLayout<&rustc_middle::ty::TyS> as rustc_codegen_llvm::type_of::LayoutLlvmExt>::llvm_type
   9: rustc_codegen_ssa::mir::operand::OperandRef<V>::new_zst
  10: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
  11: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
  12: rustc_codegen_ssa::mir::codegen_mir
  13: rustc_codegen_ssa::base::codegen_instance
  14: <rustc_middle::mir::mono::MonoItem as rustc_codegen_ssa::mono_item::MonoItemExt>::define
  15: rustc_codegen_llvm::base::compile_codegen_unit::module_codegen
  16: rustc_query_system::dep_graph::graph::DepGraph<K>::with_task
  17: rustc_codegen_llvm::base::compile_codegen_unit
  18: rustc_codegen_ssa::base::codegen_crate
  19: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate
  20: rustc_interface::passes::QueryContext::enter
  21: rustc_interface::queries::Queries::ongoing_codegen
  22: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter
  23: rustc_span::with_source_map
  24: rustc_interface::interface::create_compiler_and_run
  25: scoped_tls::ScopedKey<T>::set
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

Activity

added
C-bugCategory: This is a bug.
I-ICEIssue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Aug 21, 2020
FedericoPonzi

FedericoPonzi commented on Sep 2, 2020

@FedericoPonzi
Contributor
DutchGhost

DutchGhost commented on Sep 2, 2020

@DutchGhost
ContributorAuthor

It looks like it's correctly complied on rustc 1.48.0-nightly:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3c6f489ca24c8ea66c9759f990a19f5a

I dont think this should compile at all. After all, we cause UB, the value of the const generic does not comply with the rules of &str.

lcnr

lcnr commented on Sep 3, 2020

@lcnr
Contributor

A str being valid utf-8 is not a language invariant, so this should not be UB to my knowledge.

cc @RalfJung

I do expect this to be the correct behavior, so marking as E-needs-test

added
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Sep 3, 2020
lcnr

lcnr commented on Sep 3, 2020

@lcnr
Contributor

We do have to compare the strs to check for equality though, which might cause library ub so 🤷

RalfJung

RalfJung commented on Sep 3, 2020

@RalfJung
Member

@lcnr I think you are free to decide on your own invariant here -- assuming that two str are equal if and only if the underlying [u8] are equal, it's your call whether you want to allow non-UTF-8 str in types or not.

What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes. So if we follow that analogy, the non-UTF-8 str should be fine, but we have to make sure that we use the right notion of equality.

lcnr

lcnr commented on Sep 3, 2020

@lcnr
Contributor

What is the plan for other library types? Like, imagine a user defined a newtype around u8 that must be even. The compiler cannot even know that such an invariant exists so it cannot be enforced. Equality checking anyway happens by comparing the underlying bytes.

👍 yeah, see #72396 for more details here I guess

One concern is that we probably want to print strs in std::any::type_name and in encodings, so we need to either deal with invalid utf there or forbid it entirely 🤔 Will have to keep that in mind before stabilizing str parameters.

RalfJung

RalfJung commented on Sep 3, 2020

@RalfJung
Member

Yeah, you could either use from_utf8_lossy (so any special treatment of str is confined to the printing code) or make str a very special compiler-understood type even for the type system itself (and reject non-UTF-8 strings early). The former seems more principled to me but that is just a slight preference.

lcnr

lcnr commented on Sep 3, 2020

@lcnr
Contributor

from_utf8_lossy won't work for symbol mangling, as otherwise two distinct types can have the same symbol: https://rust.godbolt.org/z/ox7f8W (current ICE is unrelated, as const generics is not yet fully supported there)

Printing invalid utf8 as bytes might be simpler there 🤔 I guess cc @eddyb about that

RalfJung

RalfJung commented on Sep 3, 2020

@RalfJung
Member

Oh, this is not just pretty-printing but needs to be injective?

Then I guess some \i{...} or so (for "invalid") would work.

self-assigned this
on Oct 26, 2020
added a commit that references this issue on Oct 26, 2020

Rollup merge of rust-lang#78388 - camelid:regression-tests, r=lcnr

98e2a95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-const-genericsArea: const generics (parameters and arguments)C-bugCategory: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.F-const_generics`#![feature(const_generics)]`I-ICEIssue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @RalfJung@jonas-schievink@FedericoPonzi@DutchGhost@lcnr

    Issue actions

      ICE with non-utf8 const generic &str · Issue #75763 · rust-lang/rust