Skip to content

impl Display for CStr #550

@Darksonn

Description

@Darksonn

Proposal

Problem statement

In multi-language projects that need to perform interop with C or C++, you will often need to manipulate most of your strings using the core::ffi::CStr type rather than the usual str type, because you need the string to be nul-terminated before you can pass it into the C/C++ codebase. This means that working with nul-terminated strings should be somewhat convenient.

Unfortunately, it's currently very difficult to print a nul-terminated string. The type does not implement Display and there's no display() function either. Currently, you have to do something like this to print it:

println!("{}", my_string.to_string_lossy());

Or use an extension trait for CStr.

What about Debug?

It doesn't print the right thing.

println!("{:?}", c"Hello, here is a string: \n\"\'\xff.");
"Hello, here is a string: \n\"\'\xff."

It wraps the string in quotes, and also escapes various characters such as quotes and newlines.

Motivating examples or use cases

This is motivated by the Linux Kernel, where we currently have a custom CStr type that implements Display. We would like to move away from the custom CStr so that we can use c"" literals. However, there are many places where we print a nul-terminated string for various reasons.

dev_debug!("Registered {name}.");

This becomes:

use kernel::prelude::*; // for CStrExt

dev_debug!("Registered {}.", name.display());

Ideally we would like to be able to continue printing strings with the original syntax.

We don't really care about utf-8 here, and the thing we are printing to doesn't either. Printing replacement characters like how String::from_utf8_lossy would be fine.

Solution sketch

The solution sketch is to implement Display for CStr and CString. The implementation would be the same as for the ByteStr type.

Alternatives

The primary question to consider here is how to deal with bytes that are not valid utf-8.

Add a .display() function

The Path type has a similar problem, but the standard library has chosen to handle it by adding a .display() method. The Path type does not implement the Display trait directly.

I think that we should not repeat this solution for CStr. The Path type carries with it the intent that you are being careful with your paths and that you want to avoid accidentally breaking a path by round-tripping it through the String type. The CStr type does not carry the same intent with it, and implementing Display is more convenient because it lets you print with the "{name}" syntax instead of having to do "{}", name.display().

How to escape the string

This proposal says that the default way to print a CStr should be to use replacement characters rather than some other form of escaping. This is because cstr printing is generally used to display the string to a user, and � is much better at conveying that the data is invalid than \xff to a non-technical user.

Links and related work

Zulip thread: zulip
Adding OsStr::display: tracking issue
LKML thread: lkml

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Activity

BurntSushi

BurntSushi commented on Feb 27, 2025

@BurntSushi
Member

The new ByteStr type currently implements Display in the way you suggest here: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#trait-implementations-1

I'm unsure about the alternate Display impl here.

Is there a semantic difference between ByteStr and CStr? I think the Display impl for ByteStr is fine because the type is explicitly documented to be conventionally UTF-8. So there's an expectation that the data is "string-like," and more specifically, UTF-8. That is in contrast to &[u8] which carries no such connotation. And indeed, the Display impl on ByteStr (along with its Debug impl) is part of the point of opting into using a type like ByteStr.

Does a CStr have a similar broadly applicable connotation? I think so, although I'm not as steeped in C idioms. And I'm unsure about the UTF-8 expectation.

Darksonn

Darksonn commented on Feb 27, 2025

@Darksonn
Author

Oh I had not heard of ByteStr. Yes, I think ByteStr has exactly the same semantics as CStr. In fact, perhaps CStr should Deref to ByteStr?

Darksonn

Darksonn commented on Feb 27, 2025

@Darksonn
Author

I updated the solution sketch to say that this shares the impl with ByteStr.

joshtriplett

joshtriplett commented on Feb 27, 2025

@joshtriplett
Member

I kinda love the idea of a Deref impl here.

And yes, I think the Display impls should match, other than omitting the trailing NUL in the case of CStr/CString.

hanna-kruppe

hanna-kruppe commented on Feb 27, 2025

@hanna-kruppe

CStr -> ByteStr would be an expensive operation (strlen) if the plan of making &CStr a thin pointer without length metadata is ever carried out. It’s not the first CStr API that would from O(1) to O(n) if length is no longer stored, but since Deref is often invoked implicitly, it would be by far the hardest to consciously avoid.

Personally I think the probability of “thin pointer CStr” change ever happening is already low and getting lower every year but a Deref impl feels like a novel nail in the coffin.

tgross35

tgross35 commented on Feb 28, 2025

@tgross35

Does a CStr have a similar broadly applicable connotation? I think so, although I'm not as steeped in C idioms. And I'm unsure about the UTF-8 expectation.

C's char has an implementation-defined encoding and can be controlled with -fexec-charset and also pragmas, it looks like Clang and GCC default to UTF-8 for literals, but on Windows uses is the default code page. char8_t is the newer type intended to signify UTF-8, corresponding to string literals with the u8"..." prefix.

It's likely best to say that our CStr is just a bag of nonzero bytes terminated by a null, without any conventional encoding. But we already provide API to make life easier when the bytes happen to be UTF-8 or UTF-8-like with conversions to and from &str / String, and a Display implementation seems like it would simply be an additional case of this. At least, I don't think that having a Display implementation that assumes something similar to UTF-8 punishes other uses of CStr that would need an external conversion routine anyway.

https://internals.rust-lang.org/t/pre-rfc-deprecate-and-replace-cstr-cstring/5016/38 has some more details about C encodings (that entire thread has some good comments).

tgross35

tgross35 commented on Feb 28, 2025

@tgross35

Also related, regarding Display round trips rust-lang/rust#136687.

Darksonn

Darksonn commented on Apr 10, 2025

@Darksonn
Author
tamird

tamird commented on Apr 17, 2025

@tamird

Given the concerns about &CStr becoming a thin pointer, I think my PR is a no-go. Would the next logical step be to open a PR implementing Display for CStr behind a feature gate?

BurntSushi

BurntSushi commented on Apr 17, 2025

@BurntSushi
Member

Trait impls on stable types are insta-stable themselves, so that would need to go through FCP.

But the next step here is for us to sign off on this ACP. Then PR. Then that has to go through FCP because it's insta stable.

A Display impl for CStr seems reasonable to me. I'll accept this. Once a tracking issue is created, we can close this.

One concern for stabilization is whether this should block on ByteStr (and its Display impl) becoming stable. Particularly if we want those Display impls to be consistent with one another.

added
ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)
on Apr 17, 2025
tamird

tamird commented on Apr 17, 2025

@tamird

@BurntSushi thanks. Are more sign offs required?

The feature lifecycle doc describes tracking issues as tracking feature stabilization, but since this would be insta-stable, I presume the tracking issue would be used to hold the FCP?

If my understanding is correct, the next steps are to create a PR and a tracking issue, then start the FCP on the tracking issue. Once the FCP is concluded, the PR can be merged. Is that correct?

26 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @joshtriplett@Amanieu@BurntSushi@Darksonn@the8472

        Issue actions

          `impl Display for CStr` · Issue #550 · rust-lang/libs-team