-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Proposal
Problem statement
In multi-language projects that need to perform interop with C or C++, you will often need to manipulate most of your strings using the core::ffi::CStr
type rather than the usual str
type, because you need the string to be nul-terminated before you can pass it into the C/C++ codebase. This means that working with nul-terminated strings should be somewhat convenient.
Unfortunately, it's currently very difficult to print a nul-terminated string. The type does not implement Display
and there's no display()
function either. Currently, you have to do something like this to print it:
println!("{}", my_string.to_string_lossy());
Or use an extension trait for CStr
.
What about Debug
?
It doesn't print the right thing.
println!("{:?}", c"Hello, here is a string: \n\"\'\xff.");
"Hello, here is a string: \n\"\'\xff."
It wraps the string in quotes, and also escapes various characters such as quotes and newlines.
Motivating examples or use cases
This is motivated by the Linux Kernel, where we currently have a custom CStr
type that implements Display
. We would like to move away from the custom CStr
so that we can use c""
literals. However, there are many places where we print a nul-terminated string for various reasons.
dev_debug!("Registered {name}.");
This becomes:
use kernel::prelude::*; // for CStrExt
dev_debug!("Registered {}.", name.display());
Ideally we would like to be able to continue printing strings with the original syntax.
We don't really care about utf-8 here, and the thing we are printing to doesn't either. Printing replacement characters like how String::from_utf8_lossy
would be fine.
Solution sketch
The solution sketch is to implement Display
for CStr
and CString
. The implementation would be the same as for the ByteStr
type.
Alternatives
The primary question to consider here is how to deal with bytes that are not valid utf-8.
Add a .display()
function
The Path
type has a similar problem, but the standard library has chosen to handle it by adding a .display()
method. The Path
type does not implement the Display
trait directly.
I think that we should not repeat this solution for CStr
. The Path
type carries with it the intent that you are being careful with your paths and that you want to avoid accidentally breaking a path by round-tripping it through the String
type. The CStr
type does not carry the same intent with it, and implementing Display
is more convenient because it lets you print with the "{name}"
syntax instead of having to do "{}", name.display()
.
How to escape the string
This proposal says that the default way to print a CStr
should be to use replacement characters rather than some other form of escaping. This is because cstr printing is generally used to display the string to a user, and � is much better at conveying that the data is invalid than \xff
to a non-technical user.
Links and related work
Zulip thread: zulip
Adding OsStr::display
: tracking issue
LKML thread: lkml
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
Activity
BurntSushi commentedon Feb 27, 2025
The new
ByteStr
type currently implementsDisplay
in the way you suggest here: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#trait-implementations-1I'm unsure about the alternate
Display
impl here.Is there a semantic difference between
ByteStr
andCStr
? I think theDisplay
impl forByteStr
is fine because the type is explicitly documented to be conventionally UTF-8. So there's an expectation that the data is "string-like," and more specifically, UTF-8. That is in contrast to&[u8]
which carries no such connotation. And indeed, theDisplay
impl onByteStr
(along with itsDebug
impl) is part of the point of opting into using a type likeByteStr
.Does a
CStr
have a similar broadly applicable connotation? I think so, although I'm not as steeped in C idioms. And I'm unsure about the UTF-8 expectation.Darksonn commentedon Feb 27, 2025
Oh I had not heard of
ByteStr
. Yes, I thinkByteStr
has exactly the same semantics asCStr
. In fact, perhapsCStr
shouldDeref
toByteStr
?Darksonn commentedon Feb 27, 2025
I updated the solution sketch to say that this shares the impl with
ByteStr
.joshtriplett commentedon Feb 27, 2025
I kinda love the idea of a Deref impl here.
And yes, I think the Display impls should match, other than omitting the trailing NUL in the case of CStr/CString.
hanna-kruppe commentedon Feb 27, 2025
CStr
->ByteStr
would be an expensive operation (strlen) if the plan of making&CStr
a thin pointer without length metadata is ever carried out. It’s not the firstCStr
API that would from O(1) to O(n) if length is no longer stored, but since Deref is often invoked implicitly, it would be by far the hardest to consciously avoid.Personally I think the probability of “thin pointer CStr” change ever happening is already low and getting lower every year but a Deref impl feels like a novel nail in the coffin.
tgross35 commentedon Feb 28, 2025
C's
char
has an implementation-defined encoding and can be controlled with-fexec-charset
and also pragmas, it looks like Clang and GCC default to UTF-8 for literals, but on Windows uses is the default code page.char8_t
is the newer type intended to signify UTF-8, corresponding to string literals with theu8"..."
prefix.It's likely best to say that our
CStr
is just a bag of nonzero bytes terminated by a null, without any conventional encoding. But we already provide API to make life easier when the bytes happen to be UTF-8 or UTF-8-like with conversions to and from&str
/String
, and aDisplay
implementation seems like it would simply be an additional case of this. At least, I don't think that having aDisplay
implementation that assumes something similar to UTF-8 punishes other uses ofCStr
that would need an external conversion routine anyway.https://internals.rust-lang.org/t/pre-rfc-deprecate-and-replace-cstr-cstring/5016/38 has some more details about C encodings (that entire thread has some good comments).
tgross35 commentedon Feb 28, 2025
Also related, regarding
Display
round trips rust-lang/rust#136687.Darksonn commentedon Apr 10, 2025
Related PR: rust-lang/rust#138498
core
wanted features & bugfixes Rust-for-Linux/linux#514tamird commentedon Apr 17, 2025
Given the concerns about
&CStr
becoming a thin pointer, I think my PR is a no-go. Would the next logical step be to open a PR implementingDisplay
forCStr
behind a feature gate?BurntSushi commentedon Apr 17, 2025
Trait impls on stable types are insta-stable themselves, so that would need to go through FCP.
But the next step here is for us to sign off on this ACP. Then PR. Then that has to go through FCP because it's insta stable.
A
Display
impl forCStr
seems reasonable to me. I'll accept this. Once a tracking issue is created, we can close this.One concern for stabilization is whether this should block on
ByteStr
(and itsDisplay
impl) becoming stable. Particularly if we want thoseDisplay
impls to be consistent with one another.tamird commentedon Apr 17, 2025
@BurntSushi thanks. Are more sign offs required?
The feature lifecycle doc describes tracking issues as tracking feature stabilization, but since this would be insta-stable, I presume the tracking issue would be used to hold the FCP?
If my understanding is correct, the next steps are to create a PR and a tracking issue, then start the FCP on the tracking issue. Once the FCP is concluded, the PR can be merged. Is that correct?
26 remaining items