Description
The case for it working:
I print the address. I then parse the formatted text, and convert the parsed integer into a pointer. I then dereference that constructed pointer. The constructed pointer is "derived from" the original pointer, so it should be valid.
The case for it not working:
Wow that's annoying. Plus, it'd be very nice to be able to implement e.g. tagged pointers in the obvious way, using future provenance-aware APIs, such as
unsafe fn tag<T: ? Sized>(ptr: *mut T, tag: u8) -> *mut T {
let addr: usize = ptr.addr();
debug_assert_eq!(addr & (tag as usize), 0);
ptr.with_addr(addr | (tag as usize))
}
Once we have provenance-aware APIs, it makes sense to allow manipulation of addresses without leaking provenance, as the developer has indicated that they intend to maintain the pointer provenance, not try to reconstruct it from just the address.
But what about all of the ptr as usize
in the standard library today? How much of it can avoid the ptrtoint, how much can potentially be ptr.addr()
in the future, and how much will need to be (the moral equivalent of a) ptr::leak
to maintain compatibility with today's semantics?
Printing the address of a pointer is the obvious example of a ptrtoint which does not necessarily lead to a inttoptr. I don't actually know if any other obscure ways to effectively get ptrtoint without writing ptr as usize
exist in the standard library.
Activity
[-]Should format!("{p:p}") leak provenance?[/-][+]Should `format!("{:p}", ptr)` leak provenance?[/+]RalfJung commentedon Mar 23, 2022
"leaking provenance" means wildly different things in different context to different people, so I have no idea what you are even asking, or what the "it" is that might be working or not. ;)
CAD97 commentedon Mar 23, 2022
I mean in the exact way that
ptrtoint
behaves.If I do
inttoptr(ptrtoint(_))
, that's not considered a noöp, right, because of provenance. The way we modelptrtoint
is now it's fair game for anyone tointtoptr
any equivalent int and get back the ptr.What I'm interested in is whether
format!("{:p}", ptr)
necessarily usesptrtoint
, or if it could use a (at this point IIUC hypothetical) weaker form which does not makeinttoptr
valid.RalfJung commentedon Mar 23, 2022
So far I have been considering two kinds of models:
I guess you are saying we could have a variant of (1) where two kinds of ptrtoint casts exist -- those that "broadcast" and those that do not. Sure, we could. However that solves none of the things that make (1) a mess so it doesn't really help I think.
CAD97 commentedon Mar 23, 2022
Although I very much like (2) and agree that it would be a better model, all other things being equal, I worry about the fact that
ptr as usize as ptr
is defacto allowed today, and retroactively saying it produces an unusable pointer even on existing editions might be too breaking.So what I'm wondering is if we can't have (2) for new code, but (1) for old code, even if old code is necessarily pessimised by such.
The model in Rust land would roughly be (modulo naming)
ptr::leak(ptr) -> usize
: broadcast side effect; assume this SB tag always aliased.ptr::unleak(usize) -> ptr
: access to any broadcast provenance; treat as "unknown" provenance.ptr::addr(ptr) -> usize
: get addr without broadcast.ptr::with_addr(ptr, usize) -> ptr
: set addr with given provenance.ptr as usize
, edition old:ptr::leak
.usize as ptr
, edition old:ptr::unleak
.ptr as usize
, edition new: error orptr::addr
.usize as ptr
, edition new: error.The remaining question in such a world is whether existing
ptrtoint
in the stdlib need to continue to have the broadcast side effect for compatibility, or if they can remove it (and say relying on it was UB).There's definitely complications I haven't considered, but I'm just ((un)reasonably?) scared of retroactively saying any lib using
usize as ptr
is UB.And of course, whatever model is chosen is predicated on what backends (i.e. LLVM) have support for; if LLVM doesn't have a way for a
ptrtoint
equivalent to broadcast, there's effectively no way for Rust to supportptr::leak
either.(Now I guess I should go rewrite ptr-union's alignment tagging to use
wrapping_offset
rather thanusize
bitops...)RalfJung commentedon Mar 23, 2022
I agree with these concerns, but as long as (1) exists at all, it has all the bad effects I described.
The only cap-out I can imagine is to say "the formal spec doesn't cover code on old editions that still uses the deprecated casts; such code is supported on a best-effort basis". That would actually help the formal efforts.
But anyway that is wildly off-topic from the original question in this thread.^^
CAD97 commentedon Mar 23, 2022
I mean, it's a question of if the OP question is moot. And I think we have reasonable consensus (at least between the two of us, and that's enough for me to close the issue) that
So to answer the OP question:
No, parsing and then dereferencing the address printed by
format!("{:p}", ptr)
is UB.Diggsey commentedon Mar 23, 2022
@CAD97 I'm confused by your conclusion. Surely under (1) parsing and then dereferencing the address is not UB, since the pointer was broadcast, since
ptrtointo
is presumably needed before the int-to-string conversion.CAD97 commentedon Mar 24, 2022
Yes, you're correct, it would be defined under (1), I sort of (accidentally) assumed that (1) wasn't the case in the conclusion.
(The question is still moot under that model, though, as there's no way for the answer to be different.)