OsString::from_wide (in Windows OsStringExt) is unsound

The following program causes UB:
```rust
use std::os::windows::ffi::{OsStrExt, OsStringExt};
use std::ffi::{OsStr, OsString};

fn main() {
    let base = "a\té \u{7f}💩\r";
    let mut base: Vec<u16> = OsStr::new(base).encode_wide().collect();
    base.push(0xD800);
    let _res = OsString::from_wide(&base);
}
```
Miri says:
```
error: Undefined Behavior: type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
    |
102 |     transmute(i)
    |     ^^^^^^^^^^^^ type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::char::from_u32_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
    = note: inside `std::sys_common::wtf8::Wtf8Buf::push_code_point_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:204:26
    = note: inside `std::sys_common::wtf8::Wtf8Buf::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:194:21
    = note: inside `<std::ffi::OsString as std::os::windows::ffi::OsStringExt>::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys/windows/ext/ffi.rs:101:44
note: inside `main` at wtf8.rs:8:16
   --> wtf8.rs:8:16
    |
8   |     let _res = OsString::from_wide(&base);
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^
```

The problem is this code:

https://github.com/rust-lang/rust/blob/96dd4690c3aa70ec312448c3f2d50e6dc6fb87df/src/libstd/sys_common/wtf8.rs#L293-L305

This calls `push_code_point_unchecked` unless the new code point is in `0xDC00..=0xDFFF`, but what about surrogates in `0xD800..0xDC00`?

This code is unchanged since its introduction in https://github.com/rust-lang/rust/commit/c5369ebc7f4791c4e291951751b8964052c7a523. I am not sure what the intended safety contract of `push_code_point_unchecked` is. That method is not marked `unsafe` but clearly should be -- it calls `char::from_u32_unchecked`. So my guess is the safety precondition is that `CodePoint` must not be part of a surrogate pair, but the thing is, `push` calls it without actually ensuring that condition. The condition it *does* ensure is that the codepoint is not in `0xDC00..=0xDFFF`, but that does not help.

	pub fn push(&mut self, code_point: CodePoint) {
	if let trail @ 0xDC00..=0xDFFF = code_point.to_u32() {
	if let Some(lead) = (&*self).final_lead_surrogate() {
	let len_without_lead_surrogate = self.len() - 3;
	self.bytes.truncate(len_without_lead_surrogate);
	self.push_char(decode_surrogate_pair(lead, trail as u16));
	return;
	}
	}

	// No newly paired surrogates at the boundary.
	self.push_code_point_unchecked(code_point)
	}

	ty::Char => tcx.intern_layout(Layout::scalar(
	self,
	Scalar { value: Int(I32, false), valid_range: 0..=0x10FFFF },
	)),

	pub fn to_char(&self) -> Option<char> {
	match self.value {
	0xD800..=0xDFFF => None,
	_ => Some(unsafe { char::from_u32_unchecked(self.value) }),
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OsString::from_wide (in Windows OsStringExt) is unsound #72760

Safety

6 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

OsString::from_wide (in Windows OsStringExt) is unsound #72760

Description

Activity

SimonSapin commented on May 30, 2020

SimonSapin commented on May 30, 2020

SimonSapin commented on May 30, 2020

Safety

RalfJung commented on May 30, 2020

RalfJung commented on May 30, 2020

SimonSapin commented on May 30, 2020

RalfJung commented on May 30, 2020

SimonSapin commented on May 30, 2020

RalfJung commented on May 30, 2020

6 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions