Open
Description
During the pointer docs update, a discussion spawned off about effects of virtual memory: What do we still guarantee when different virtual addresses map to the same physical address? Some of libstd, e.g. copy_nonoverlapping
, will start misbehaving in that situation. FWIW, C seems to just not care: memmove
has the same problem.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
glaebhoerl commentedon Sep 18, 2018
(Even worse: "on a modern computer, a pointer dereference could very [well] lead to the execution of Python code")
gnzlbg commentedon Sep 18, 2018
Is it undefined behavior to have two pointers with different values referring to the same object?
TL;DR:
and
With pretty much the recommendation that anything doing anything like this should use volatile load and stores.
RalfJung commentedon Sep 18, 2018
Oh but you can (but for unrelated reasons.^^)
RalfJung commentedon Sep 18, 2018
More on topic, AFAIK LLVM does not use aliasing information to inform ptr equality tests, and vice versa. So with a refined view of aliasing, I see no reason why LLVM's memory model would not work with overlapping virtual memory.
gnzlbg commentedon Sep 18, 2018
Could you point me to which part of the blog post allows you to construct two pointers of different addresses that refer to the same object in C or C++ ?
The only relevant example that I find there is the one of the out-of-bound pointers, but that's the opposite case, where one has two pointers, pointing to the same address, but that refer to two different objects - technically, one refers to an object, the other does not and is therefore illegal to dereference.
RalfJung commentedon Sep 18, 2018
You said "different values". The value of a pointer includes its provenance information. But I am just being picky here, that's off-topic -- sorry.
alercah commentedon Sep 18, 2018
This should definitely be a topic of discussion, especially because of the possibility that a filesystem process ends up
mmap
ing its own data. Thar be dragons here, but hopefully we can come up with sensible guarantees at least.gnzlbg commentedon Sep 18, 2018
@RalfJung No offense taken, that's an important difference! With
mmap
one can create two pointers of different values (with different bit-pattern and different provenance) that on purpose refer to the same memory. In the out-of-bound example, obtaining two pointers with the same bitpattern and different provenance (e.g. obtained from two differentmalloc
calls) feels incidental to me.EDIT: What I mean here with "on purpose" and "incidental" is that in the
mmap
case, it is the user's intent to be able to manipulate the same object/memory/... via two pointers with different values (with different bit-pattern and different provenance). In the out-of-bounds case, that's less clear to me. We don't necessarily have to treat both cases equally.JakobDegen commentedon May 29, 2023
It seems unlikely to me that we'll want to adjust the AM memory model specifically to support this. I think that basically gives us very little wiggle room here. Option one for users is to use virtual memory shenanigans in a way that makes sense to the AM. This probably means not double-mapping pages. Option 2 is to use volatile operations, which are the standard tool for "I have some concept of memory that the AM doesn't understand."
Is there anything I'm missing here?
comex commentedon May 30, 2023
A one-mapping-per-process limit would be bad for composability. It would mean that libraries essentially could not make non-volatile accesses to writable mappings at all – even if they avoided data races using some external synchronization mechanism or using atomics – since they wouldn't know if some other library in the same process was using the same file.
(That said, for the case of atomics, aren't you technically supposed to use volatile atomics for cross-process synchronization anyway? But the standard library has no support for those.)
Is there any actual problem here other than
copy_overlapping
? The Python example doesn't count; it's not specific to having multiple mappings. (If the Python code fails to preserve the expected behavior of memory reads and writes, then the memory mapping becomes something like MMIO, 'special' memory where reads and writes have arbitrary effects. That case clearly requires volatile accesses. If the Python code does preserve the expected behavior of reads and writes, then it doesn't matter.)