Skip to content

Simple Arc implementation (without Weak refs) #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 18, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@
* [Handling Zero-Sized Types](vec-zsts.md)
* [Final Code](vec-final.md)
* [Implementing Arc and Mutex](arc-and-mutex.md)
* [Arc](arc.md)
* [Layout](arc-layout.md)
* [Base Code](arc-base.md)
* [Cloning](arc-clone.md)
* [Dropping](arc-drop.md)
* [Deref](arc-deref.md)
* [Final Code](arc-final.md)
* [FFI](ffi.md)
* [Beneath `std`](beneath-std.md)
* [#[panic_handler]](panic-handler.md)
2 changes: 1 addition & 1 deletion src/arc-and-mutex.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ Knowing the theory is all fine and good, but the *best* way to understand
something is to use it. To better understand atomics and interior mutability,
we'll be implementing versions of the standard library's Arc and Mutex types.

TODO: ALL OF THIS OMG
TODO: Mutex
104 changes: 104 additions & 0 deletions src/arc-base.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Base Code

Now that we've decided the layout for our implementation of `Arc`, let's create
some basic code.

## Constructing the Arc

We'll first need a way to construct an `Arc<T>`.

This is pretty simple, as we just need to box the `ArcInner<T>` and get a
`NonNull<T>` pointer to it.

We start the reference counter at 1, as that first reference is the current
pointer. As the `Arc` is cloned or dropped, it is updated. It is okay to call
`unwrap()` on the `Option` returned by `NonNull` as `Box::into_raw` guarantees
that the pointer returned is not null.

```rust,ignore
impl<T> Arc<T> {
pub fn new(data: T) -> Arc<T> {
// We start the reference count at 1, as that first reference is the
// current pointer.
let boxed = Box::new(ArcInner {
rc: AtomicUsize::new(1),
data,
});
Arc {
// It is okay to call `.unwrap()` here as we get a pointer from
// `Box::into_raw` which is guaranteed to not be null.
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(),
_marker: PhantomData,
}
}
}
```

## Send and Sync

Since we're building a concurrency primitive, we'll need to be able to send it
across threads. Thus, we can implement the `Send` and `Sync` marker traits. For
more information on these, see [the section on `Send` and
`Sync`](send-and-sync.md).

This is okay because:
* You can only get a mutable reference to the value inside an `Arc` if and only
if it is the only `Arc` referencing that data
* We use atomic counters for reference counting

```rust,ignore
unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}
```

We need to have the bound `T: Sync + Send` because if we did not provide those
bounds, it would be possible to share values that are thread-unsafe across a
thread boundary via an `Arc`, which could possibly cause data races or
unsoundness.

## Getting the `ArcInner`

We'll now want to make a private helper function, `inner()`, which just returns
the dereferenced `NonNull` pointer.

To dereference the `NonNull<T>` pointer into a `&T`, we can call
`NonNull::as_ref`. This is unsafe, unlike the typical `as_ref` function, so we
must call it like this:
```rust,ignore
// inside the impl<T> Arc<T> block from before:
fn inner(&self) -> &ArcInner<T> {
unsafe { self.ptr.as_ref() }
}
```

This unsafety is okay because while this `Arc` is alive, we're guaranteed that
the inner pointer is valid.

Here's all the code from this section:
```rust,ignore
impl<T> Arc<T> {
pub fn new(data: T) -> Arc<T> {
// We start the reference count at 1, as that first reference is the
// current pointer.
let boxed = Box::new(ArcInner {
rc: AtomicUsize::new(1),
data,
});
Arc {
// It is okay to call `.unwrap()` here as we get a pointer from
// `Box::into_raw` which is guaranteed to not be null.
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(),
_marker: PhantomData,
}
}

fn inner(&self) -> &ArcInner<T> {
// This unsafety is okay because while this Arc is alive, we're
// guaranteed that the inner pointer is valid.
unsafe { self.ptr.as_ref() }
}
}

unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}
```
60 changes: 60 additions & 0 deletions src/arc-clone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Cloning

Now that we've got some basic code set up, we'll need a way to clone the `Arc`.

Basically, we need to:
1. Get the `ArcInner` value of the `Arc`
2. Increment the atomic reference count
3. Construct a new instance of the `Arc` from the inner pointer

Next, we can update the atomic reference count as follows:
```rust,ignore
self.inner().rc.fetch_add(1, Ordering::Relaxed);
```

As described in [the standard library's implementation of `Arc` cloning][2]:
> Using a relaxed ordering is alright here, as knowledge of the original
> reference prevents other threads from erroneously deleting the object.
>
> As explained in the [Boost documentation][1]:
> > Increasing the reference counter can always be done with
> > memory_order_relaxed: New references to an object can only be formed from an
> > existing reference, and passing an existing reference from one thread to
> > another must already provide any required synchronization.
>
> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html
[2]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1171-L1181

We'll need to add another import to use `Ordering`:
```rust,ignore
use std::sync::atomic::Ordering;
```

It is possible that in some contrived programs (e.g. using `mem::forget`) that
the reference count could overflow, but it's unreasonable that would happen in
any reasonable program.

Then, we need to return a new instance of the `Arc`:
```rust,ignore
Self {
ptr: self.ptr,
_marker: PhantomData
}
```

Now, let's wrap this all up inside the `Clone` implementation:
```rust,ignore
use std::sync::atomic::Ordering;

impl<T> Clone for Arc<T> {
fn clone(&self) -> Arc<T> {
// Using a relaxed ordering is alright here as knowledge of the original
// reference prevents other threads from wrongly deleting the object.
self.inner().rc.fetch_add(1, Ordering::Relaxed);
Self {
ptr: self.ptr,
_marker: PhantomData,
}
}
}
```
25 changes: 25 additions & 0 deletions src/arc-deref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Deref

Alright. We now have a way to make, clone, and destroy `Arc`s, but how do we get
to the data inside?

What we need now is an implementation of `Deref`.

We'll need to import the trait:
```rust,ignore
use std::ops::Deref;
```

And here's the implementation:
```rust,ignore
impl<T> Deref for Arc<T> {
type Target = T;

fn deref(&self) -> &T {
&self.inner().data
}
}
```

Pretty simple, eh? This simply dereferences the `NonNull` pointer to the
`ArcInner<T>`, then gets a reference to the data inside.
92 changes: 92 additions & 0 deletions src/arc-drop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Dropping

We now need a way to decrease the reference count and drop the data once it is
low enough, otherwise the data will live forever on the heap.

To do this, we can implement `Drop`.

Basically, we need to:
1. Get the `ArcInner` value of the `Arc`
2. Decrement the reference count
3. If there is only one reference remaining to the data, then:
4. Atomically fence the data to prevent reordering of the use and deletion of
the data, then:
5. Drop the inner data

Now, we need to decrement the reference count. We can also bring in step 3 by
returning if the reference count is not equal to 1 (as `fetch_sub` returns the
previous value):
```rust,ignore
if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
return;
}
```

We then need to create an atomic fence to prevent reordering of the use of the
data and deletion of the data. As described in [the standard library's
implementation of `Arc`][3]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and change this to something like:


What atomic ordering should we use here? To know that, we need to consider what happens-before relationships we want to ensure (or alternatively, what dataraces we want to prevent). Drop is special because it's the one place where we mutate the Arc's payload (by dropping and freeing it). This is a potential read-write data race with all the other threads that have been happily reading the payload without any synchronization.

So we need to ensure all those non-atomic accesses have a proper happens-before relationship with us dropping and freeing the payload. To establish happens-before relationships with non-atomic accesses, we need (at least) Acquire-Release semantics.

As a reminder, Acquires ensure non-atomic accesses after them on the same thread stay after them (they happen-before everything that comes after them) and Releases ensure non-atomic accesses before them on the same thread stay before them (everything before them happens-before them).

So we have many threads that look like this:

(A) non-atomic accesses to payload
(B) atomic decrement refcount

And a "final" thread that looks like this:

(C) non-atomic accesses to payload
(D) atomic decrement refcount
(E) non-atomic free/drop contents

And we want to ensure every thread agrees that everything happens-before E.

One thing that jumps out clearly is that the non-final threads all end with an atomic access (B), and we want to keep everything else (A) before it. That's exactly what a Release does! So it seems we'd like our atomic decrement to be a Release.

if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
  return;
}

However this on its own doesn't work -- our final thread would also use a Release, and that means (E) would be allowed to happen-before (D)! To prevent this we need Release's partner, an Acquire! We could make (D) AcquireRelease (AcqRel), but this would penalize all the other threads performing (B). So instead, we will introduce a separate Acquire that only happens if we're the final thread. And since we've already loaded all the values we need, we can use a fence.

if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
  return;
}
atomic::fence(Ordering::Acquire);

If this helps, you can think of this like a sort of implicit RWLock: every Arc is a ReadGuard which allows unlimited read access until they go away and "Release the lock" (trapping all accesses on that thread before that point). The final thread then upgrades itself to a WriteGuard which "Acquires the lock" (creating a new critical section which strictly happens after everything else).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However this on its own doesn't work -- our final thread would also use a Release, and that means (E) would be allowed to happen-before (D)!

This sounds like for X, Y we always have that one happens-before the other or vice versa... which is not the way happens-before actually works. Happens-before is a partial order, and some events simply are unordered. So I think it'd be better to phrase this accordingly, saying something like "[...] and that means (D) would not be forced to happen-before (E)". Except that's also wrong, program-order is included in happens-before. The actual issue is between (E) and (A), isn't it? We must make (A) happens-before (E), and that's why there needs to be an "acquire" in the "final" thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder, Acquires ensure non-atomic accesses after them on the same thread stay after them (they happen-before everything that comes after them) and Releases ensure non-atomic accesses before them on the same thread stay before them (everything before them happens-before them).

The key thing with an acquire is that the release it reads from (and everything that comes before it) happens-before everything that comes after the acquire. Each acquire is paired with a release, and this pair establishes a happens-before link across threads. Personally, I find this way of thinking about it easier than thinking about the release and the acquire separately. (Also see what I said above: to my knowledge, happens-before includes program-order, so "X happens-before everything that comes after it in the same thread" is true for all X.)

> This fence is needed to prevent reordering of use of the data and deletion of
> the data. Because it is marked `Release`, the decreasing of the reference
> count synchronizes with this `Acquire` fence. This means that use of the data
> happens before decreasing the reference count, which happens before this
> fence, which happens before the deletion of the data.
>
> As explained in the [Boost documentation][1],
>
> > It is important to enforce any possible access to the object in one
> > thread (through an existing reference) to *happen before* deleting
> > the object in a different thread. This is achieved by a "release"
> > operation after dropping a reference (any access to the object
> > through this reference must obviously happened before), and an
> > "acquire" operation before deleting the object.
>
> In particular, while the contents of an Arc are usually immutable, it's
> possible to have interior writes to something like a Mutex<T>. Since a Mutex
> is not acquired when it is deleted, we can't rely on its synchronization logic
> to make writes in thread A visible to a destructor running in thread B.
>
> Also note that the Acquire fence here could probably be replaced with an
> Acquire load, which could improve performance in highly-contended situations.
> See [2].
>
> [1]: https://www.boost.org/doc/libs/1_55_0/doc/html/atomic/usage_examples.html
> [2]: https://github.com/rust-lang/rust/pull/41714
[3]: https://github.com/rust-lang/rust/blob/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/alloc/src/sync.rs#L1440-L1467

To do this, we do the following:
```rust,ignore
atomic::fence(Ordering::Acquire);
```

We'll need to import `std::sync::atomic` itself:
```rust,ignore
use std::sync::atomic;
```

Finally, we can drop the data itself. We use `Box::from_raw` to drop the boxed
`ArcInner<T>` and its data. This takes a `*mut T` and not a `NonNull<T>`, so we
must convert using `NonNull::as_ptr`.

```rust,ignore
unsafe { Box::from_raw(self.ptr.as_ptr()); }
```

This is safe as we know we have the last pointer to the `ArcInner` and that its
pointer is valid.

Now, let's wrap this all up inside the `Drop` implementation:
```rust,ignore
impl<T> Drop for Arc<T> {
fn drop(&mut self) {
if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
return;
}
// This fence is needed to prevent reordering of the use and deletion
// of the data.
atomic::fence(Ordering::Acquire);
// This is safe as we know we have the last pointer to the `ArcInner`
// and that its pointer is valid.
unsafe { Box::from_raw(self.ptr.as_ptr()); }
}
}
```
80 changes: 80 additions & 0 deletions src/arc-final.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Final Code

Here's the final code, with some added comments and re-ordered imports:
```rust
use std::marker::PhantomData;
use std::ops::Deref;
use std::ptr::NonNull;
use std::sync::atomic::{self, AtomicUsize, Ordering};

pub struct Arc<T> {
ptr: NonNull<ArcInner<T>>,
_marker: PhantomData<ArcInner<T>>,
}

pub struct ArcInner<T> {
rc: AtomicUsize,
data: T,
}

impl<T> Arc<T> {
pub fn new(data: T) -> Arc<T> {
// We start the reference count at 1, as that first reference is the
// current pointer.
let boxed = Box::new(ArcInner {
rc: AtomicUsize::new(1),
data,
});
Arc {
// It is okay to call `.unwrap()` here as we get a pointer from
// `Box::into_raw` which is guaranteed to not be null.
ptr: NonNull::new(Box::into_raw(boxed)).unwrap(),
_marker: PhantomData,
}
}

fn inner(&self) -> &ArcInner<T> {
// This unsafety is okay because while this Arc is alive, we're
// guaranteed that the inner pointer is valid. Also, ArcInner<T> is
// Sync if T is Sync.
unsafe { self.ptr.as_ref() }
}
}

unsafe impl<T: Sync + Send> Send for Arc<T> {}
unsafe impl<T: Sync + Send> Sync for Arc<T> {}

impl<T> Clone for Arc<T> {
fn clone(&self) -> Arc<T> {
// Using a relaxed ordering is alright here as knowledge of the original
// reference prevents other threads from wrongly deleting the object.
self.inner().rc.fetch_add(1, Ordering::Relaxed);
Self {
ptr: self.ptr,
_marker: PhantomData,
}
}
}

impl<T> Drop for Arc<T> {
fn drop(&mut self) {
if self.inner().rc.fetch_sub(1, Ordering::Release) != 1 {
return;
}
// This fence is needed to prevent reordering of the use and deletion
// of the data.
atomic::fence(Ordering::Acquire);
// This is safe as we know we have the last pointer to the `ArcInner`
// and that its pointer is valid.
unsafe { Box::from_raw(self.ptr.as_ptr()); }
}
}

impl<T> Deref for Arc<T> {
type Target = T;

fn deref(&self) -> &T {
&self.inner().data
}
}
```
Loading