Drop impl of Vec is not idempotent and this is not documented #60822

New issue

Closed

#67559

Closed

Drop impl of Vec is not idempotent and this is not documented#60822

#67559

Labels

A-collectionsA-docsC-enhancementT-libs-api

gnzlbg

Contributor

#60611 shows that the Drop impl of Vec is not idempotent. That is, if Drop fails, the Vec is left in an "un-droppable" state, and trying to re-drop the vector invokes undefined behavior - that is, the vector must be leaked.

It might be possible to make it idempotent without adding significant overhead [0], but I don't know whether we should do this. I think we should be clearer about whether the Drop impl of a type is idempotent or not, since making the wrong assumption can lead to UB, so I believe we should document this somewhere.

We could document this for the Vec type, but maybe this can also be something that can be documented at the top of the libcore/liballoc/libstd crates for all types (e.g. Drop impls of standard types are not idempotent).

[0] Maybe setting the vector len field to zero before dropping the slice elements and having the Drop impl of RawVec set the capacity to zero before exiting is enough to avoid trying to re-drop the elements or trying to deallocate the memory (which is always freed on the first drop attempt).

gnzlbg

ContributorAuthor

cc @rust-lang/libs

added

Contributor

It might be possible to make it idempotent without adding significant overhead [0], but I don't know whether we should do this. I think we should be clearer about whether the Drop impl of a type is idempotent or not, since making the wrong assumption can lead to UB, so I believe we should document this somewhere.

"It's not idempotent" should be the safe default assumption, and I know of no case so far where we've documented that a drop glue is idempotent. We might want to mention the general principle in suitable places (e.g., nomicon, UCG) but I don't think we should repeat "as usual, double-dropping this type causes UB" again on all specific types. We should only document the exceptions, if any.

(I am also quite unsure whether going through the effort of guaranteeing that certain types' drop glue is idempotent is a good use of time, as it seems extremely niche and I don't know any use cases for that. But that just means I personally won't invest time in that discussion.)

SimonSapin

Contributor

I think this is not an issue wit Vec. Rather, any unsafe code needs to be careful about panic safety, especially if it is also generic.

FWIW I’ve always assumed that double drop could be as much a memory safety issue as use-after-free. (“One and a half” drop even more so.) I don’t think Drop::drop idempotency should be expected for any type, but that sounds like a decision for @rust-lang/wg-unsafe-code-guidelines more than @rust-lang/libs.

RalfJung

Member

"It's not idempotent" should be the safe default assumption, and I know of no case so far where we've documented that a drop glue is idempotent. We might want to mention the general principle in suitable places (e.g., nomicon, UCG) but I don't think we should repeat "as usual, double-dropping this type causes UB" again on all specific types. We should only document the exceptions, if any.

Fully agreed.

The best you can hope for, IMO, is that a type will make an effort to drop as much as possible even when there's a panic during dropping. Vec actually does that by using the slice drop glue, which, if dropping one element panics, will keep dropping the other elements. This seems more useful than allowing idempotent dropping (it minimizes leakage even without any catch_unwind being involved), and it makes idempotent dropping unnecessary (if you catch a panic, the involved types already did everything they can to drop as much as possible, so there's no point in calling drop again).

I think it makes more sense to improve panic-resiliance of our drop impls than to make them idempotent. For example, AFAIK VecDeque drops two Vec's; if dropping the first panics, the second one will be leaked. This could be improved.

One interesting question I see here is the interaction with the pinning drop guarantee. Does Vec deallocate the backing store if dropping one of the elements panicked? If yes, is that a violation of the drop guarantee (assuming we extended Vec with pinning projections -- fn pin_get(Pin<&mut Vec<T>>, idx: usize) -> Option<Pin<&mut T>>)?

gnzlbg

ContributorAuthor

Today double-drops are not a form of undefined behavior. They could, however, lead to undefined behavior depending on the types involved and how Drop is implemented for those types.

As part of the UCGs one could try to make double-drops UB per se. That would mean that unsafe code needs to make sure that double-drops don't happen, period. That might be a breaking change. If we don't do that, then we have some types for which double-drops are ok, and some types for which they are not, and the only way to tell is either by reading the documentation, or inspecting the type's source code.

I'd rather not recommend people to rely on what the source code does. It suffices that one crate starts relying on some Drop impl being "accidentally idempotent" in libstd today, for us to not be able to evolve that impl in the future without breaking code.

A "Unless stated otherwise, Drop impls of standard library types are not guaranteed to be idempotent, and if they are, we reserve the right to change that without maintaining backwards compatibility" note somewhere might be enough to prevent that.

The best you can hope for, IMO, is that a type will make an effort to drop as much as possible even when there's a panic during dropping. Vec actually does that by using the slice drop glue, which, if dropping one element panics, will keep dropping the other elements.

I expect that to happen in an idempotent Drop impl for Vec as well. Right now, the issue is that all elements that can be dropped are dropped, the backing allocation of the Vec is deallocated, but the len of the Vec is not set to zero, so a double drop will try to double drop the elements again (and then the drop impl of RawVec will be invoked, whose capacity has not been set to zero, which will try to double-free memory).

Does Vec deallocate the backing store if dropping one of the elements panicked?

Yes. When drop panics, the destructors of the fields are invoked. For Vec this means that the contained RawVec is dropped, which deallocates the backing storage of the Vec without setting the capacity of the Vec to zero.

If yes, is that a violation of the drop guarantee (assuming we extended Vec with pinning projections -- fn pin_get(Pin<&mut Vec>, idx: usize) -> Option<Pin<&mut T>>)?

AFAICT, no. If you have a Vec<Pin<Box<T>>> the only thing that will be deallocated without running destructors when that Vec is dropped is the storage of the Pin<Box<T>>. Since the destructor of the Pin<Box<T>> does not run, the destructor of the Box<T> does not run, and the backing allocation of the Box<T> gets leaked, which is fine (some other thread could still write to it).

SimonSapin

Contributor

https://rust-lang.github.io/rfcs/0320-nonzeroing-dynamic-drop.html (both the proposal that is now implemented an the description of the status quo before that) is relevant to the efforts that the language makes to not let double-Drop happen in safe Rust. This guarantee has existed for years, since before 1.0, so writers of unsafe code rely on it.

Although calling Drop::drop twice is not inherently UB in the Rust language, many unsafe libraries are written with the assumption that it never happens. Therefore, writers of generic unsafe code should assume that double drop of a value of a type parameter can cause UB.

Calling Vec::drop twice with the Unix system allocator causes a double free, which is UB. But Vec is only an example.

RalfJung

Member

I'd rather not recommend people to rely on what the source code does. It suffices that one crate starts relying on some Drop impl being "accidentally idempotent" in libstd today, for us to not be able to evolve that impl in the future without breaking code.

Drop is not the only operation that has such issues; pretty much any function you call can be UB or not under certain circumstances depending on implementation details. One example is e.g. relying on Vec::push not to reallocate -- if we did a reserve before, is that guaranteed? What if we did reserve and then pop? And so on.

If we really consider such details to be stable just because they are observable by "this program was not UB but now it is", we'd have to bake an abstract model of all of these data structures into our operational semantics, just to make some more code UB. I think that is a bad approach. UB is for enabling compiler optimizations. It is not for catching clients that exploit unstable details of the current implementation of some library. Conflating these two problems makes the definition of UB much, much more complicated, I think that's a mistake.

Just because some interaction with Vec is not UB currently, doesn't mean we are not allowed to ever change it to be UB. There is no implicit stabilization of the full set of UB-free interactions. I agree catching such "overfit" clients is an important problem, but let's keep that library-level discussion separate from the language-level discussion of what is and is not UB.

I expect that to happen in an idempotent Drop impl for Vec as well. Right now, the issue is that all elements that can be dropped are dropped, the backing allocation of the Vec is deallocated, but the len of the Vec is not set to zero, so a double drop will try to double drop the elements again (and then the drop impl of RawVec will be invoked, whose capacity has not been set to zero, which will try to double-free memory).

I don't understand why you'd even want to call drop again, if we already agree that the first drop should do the maximal amount of dropping that it can. You are basically saying "make Vec's drop idempotent by making the second drop a NOP". I don't see the point.

Seems like you want to write generic code that drops stuff again if the first drop panicked. But what would be a situation where that is ever a good idea? From all I can see, the only place where this can help is if the second drop drops stuff that the first drop "missed" because of the panic. I am saying, if that is the problem, then fix drop to not "miss" stuff.

AFAICT, no. If you have a Vec<Pin<Box>> the only thing that will be deallocated without running destructors when that Vec is dropped is the storage of the Pin<Box>. Since the destructor of the Pin<Box> does not run, the destructor of the Box does not run, and the backing allocation of the Box gets leaked, which is fine (some other thread could still write to it).

This misses the point, because Box<T>: Unpin.

The interesting case is a Vec<IntrusiveListElement>. Then we could pin the Vec, and from our Pin<&mut Vec<IntrusiveListElement>> get a Pin<&mut IntrusiveListElement>, and insert that into the list. Now we are in a situation where if the Vec's backing store gets deallocated without dropping the IntrusiveListElement, we have a safety violation. But it seems to me if IntrusiveListElement::drop can guarantee that it itself does not panic, then we are okay: if an earlier element in the list panics while dropping (maybe it's a heterogeneous list through an enum or trait objects), we know we still get dropped properly.

gnzlbg

ContributorAuthor

Now we are in a situation where if the Vec's backing store gets deallocated without dropping the IntrusiveListElement, we have a safety violation.

How can you drop the Vec<IntrusiveListElement> while Pin<&mut IntrusiveListElement>s into the vector are still live ?

One example is e.g. relying on Vec::push not to reallocate -- if we did a reserve before, is that guaranteed? What if we did reserve and then pop? And so on.

If we really consider such details to be stable just because they are observable by "this program was not UB but now it is", we'd have to bake an abstract model of all of these data structures into our operational semantics, just to make some more code UB.

The documentation of Vec guarantees these details, e.g., see Vec's capacity-and-reallocation section, so AFAICT Rust unsafe code can rely on these details, and breaking that code would be an API breaking change.

RalfJung

Member

How can you drop the Vec while Pin<&mut IntrusiveListElement>s into the vector are still live ?

I was talking about the implicit drop that happens when the vector goes out of scope.

The documentation of Vec guarantees these details, e.g., see Vec's capacity-and-reallocation section, so AFAICT Rust unsafe code can rely on these details, and breaking that code would be an API breaking change.

I am aware. When libraries decide to document such details, clients may of course rely on them. VecDeque has no such section even though many similar questions apply; thus, clients may not rely on the same properties for VecDeque even if they happen to be true currently.

This matches the situation for idempotent drop: clients may rely on this if and only if the library decides to document this as a guarantee.

I don't think we should change our language spec to detect clients relying on VecDeque implementation details, and similarly, I don't think we should change our language spec to detect clients that rely on some drop being idempotent even though that is not documented (e.g., double-drop of an empty Vec that had shrink_to_fit called on it).

gnzlbg

ContributorAuthor

https://rust-lang.github.io/rfcs/0320-nonzeroing-dynamic-drop.html (both the proposal that is now implemented an the description of the status quo before that) is relevant to the efforts that the language makes to not let double-Drop happen in safe Rust. This guarantee has existed for years, since before 1.0, so writers of unsafe code rely on it.

@SimonSapin The problem is that safe Rust can be called from unsafe Rust, and it isn't clear to me from that RFC whether writers of unsafe code can rely on other unsafe code not performing a double-drop. It isn't clear either whether unsafe code can assume that double-dropping something is ok.

gnzlbg

ContributorAuthor

I am aware. When libraries decide to document such details, clients may of course rely on them. VecDeque has no such section even though many similar questions apply; thus, clients may not rely on the same properties for VecDeque even if they happen to be true currently.

Are you saying that this is analogous to whether users should be able to rely on double-drops invoking / not invoking UB? These data-structure properties feel quite obvious to me, but I have no idea how users can today learn that, at least when using the libstd types, they should always assume that Drop is not idempotent unless a type guarantees otherwise. AFAICT they can just try it for some type, and if it works, deduce that it is ok. Then they publish their crate, and some time later we break their code. Saying that "we did not guarantee that it worked anywhere" does not change the fact that the code now is broken, and we are not warning them about this either. So even if we might be right in that technically we are allowed to break that code, it might turn out that in practice, now we cannot, and that user has somehow managed to, by accident, specify that double-drops for some type must be ok. We'd have a stronger case if we explicitly call this out in the docs, warn when users do this (or panic or similar), etc.

26 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

A-collectionsA-docsC-enhancementT-libs-api

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

Document that calling Drop, even after it panics, is UBrust-lang/rust

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop impl of Vec is not idempotent and this is not documented #60822

26 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Drop impl of Vec is not idempotent and this is not documented #60822

Description

Activity

gnzlbg commented on May 14, 2019

hanna-kruppe commented on May 14, 2019

SimonSapin commented on May 14, 2019

RalfJung commented on May 14, 2019

gnzlbg commented on May 14, 2019

SimonSapin commented on May 14, 2019

RalfJung commented on May 14, 2019

gnzlbg commented on May 14, 2019

RalfJung commented on May 14, 2019

gnzlbg commented on May 14, 2019

gnzlbg commented on May 14, 2019

26 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions