WIP: Intrusive #1

jamesmunns · 2025-05-08T13:03:26Z

No description provided.

JuliDi · 2025-05-09T07:26:39Z

src/intrusive.rs

+    T: 'static,
+    R: ScopedRawMutex + 'static,
+{
+    /// morally: Option<&'static PinList<R>>


What do you mean by that? Should we just use the Option<>?
&'static instead of the AtomicPtr would also seem more ergonomic if we require it to be 'static anyway.

I used an AtomicPtr because this is in the shared "read only" part of the storage. We'd need to use an UnsafeCell<Option<&'static PinList<R>>> (and ensure we have the lock locked) to write this in a shared static item.

We might be able to get rid of this entirely tho, see the notes about PinListNodeHandle.

src/intrusive.rs

JuliDi · 2025-05-09T07:47:18Z

src/intrusive.rs

+    // <https://github.com/rust-lang/rust/pull/82834>
+    // TODO: I don't think this heuristic is necessarily true anymore? We
+    // should check this.
+    _pin: PhantomPinned,


Honestly, I don't understand whether or not this is resolved now. Doesn't hurt to keep it in there for now, does it?

I think rust-lang/rust#63818 (comment) is what I saw at some point, definitely does not hurt to keep, I'd leave it and we can investigate more later.

src/intrusive.rs

JuliDi · 2025-05-09T08:10:24Z

src/intrusive.rs

+///
+/// ## State transition diagram
+///
+/// ```text


What tool did you use to draw these? I was thinking about doing one for the entire list, as well.

I used MonoDraw for MacOS

JuliDi · 2025-05-09T08:11:51Z

src/intrusive.rs

+    /// The "key" of our "key:value" store.
+    ///
+    /// TODO: We should probably enforce that within the linked list, all
+    /// node `key`s are unique!


Would it be sufficient to only check this when a new key is added? Once we have the list read from flash it is somewhat quick to traverse and check for the presence of key.

Yes, I think when we do attach, we should iterate over the full list before adding the new node to the list, checking if there are any duplicate keys, and failing if any node has the same key as the potential new node.

JuliDi · 2025-05-09T08:17:28Z

src/intrusive.rs

+    /// If we actually only ever need this in `attach`, maybe we just store
+    /// an Option<Waker> here instead? idk, there's maybe a better way than
+    /// this to do things.
+    state_change: Option<&'static WaitCell>,


Do we have to manage this on a per-node basis or could we have this handled between the flash worker and the node owner?
(This is more of a note to myself, I will need to look into how this is done in other implementations. Maybe the owner can pass a waker to the flash task or so)

Yeah, maybe another thing we could do is put a single WaitQueue in the PinList (and remove all the node WaitCells), and when a node is waiting for hydration, it waits on that WaitQueue, and when the worker has finished processing reads, it can do a wake_all. WaitQueue also supports wait_for, so that would likely be way easier. This could be a good first follow-up PR to do if you have time today!

src/intrusive.rs

JuliDi · 2025-05-09T08:31:40Z

src/intrusive.rs

+    ///
+    /// TODO: this state needs some more design. Especially how we mark the state
+    /// if the owner of the node writes ANOTHER new value while we are waiting
+    /// for the flush to "stick".


Does this need a different state as marking? Or is this more of a state transition back to NeedsWrite?

Yeah, we may end up needing to use a bitfield or something, because a node could potentially simultaneously be "NeedsWriting" AND "WriteStarted".

For example, we could have a process like:

The node has no writes pending, it is in ValidNoWriteNeeded

The node is written to. It moves to NeedsWrite

The flash worker starts the write, moves the node to WriteStarted

The flash worker lets go of the mutex and awaits the flash to be written

The node is written to again with NEW data.

If we overwrite WriteStarted to NeedsWrite, how to we make sure the flash worker doesn't move it back to ValidNoWriteNeeded before it does ANOTHER write?

Co-authored-by: Julian <[email protected]>

JuliDi · 2025-05-09T08:47:41Z

src/intrusive.rs

+    /// 1. Locks the mutex
+    /// 2. Iterates through each node currently linked in the list
+    /// 3. For Each node that says "I need to be hydrated", it sees if a matching
+    ///    key record exists in the hashmap (our stand-in for the actual flash


How will this be implemented with real flash? I assume we would be reading everything from the flash and keep track of all keys and their locations in a HashMap to avoid reading from flash for every key? Might also depend on how slow the reads are and whether that's acceptable.

Yeah, "it depends" on how we implement the flash data structure. If we have some way of loading, say, up to 4K at a time, we could do something like:

For each <=4K chunk of flash in the "latest" window

Read the chunk to RAM w/async+dma read

For each node in the list

If the node "needs to be hydrated":

For each K:V pair in the flash chunk

if the key from the chunk item matches the key from the node

then attempt to deser the value into the node

On success, mark it "Valid"

On failure, mark it "NonResident"

When all chunks from the "latest" flash window have been processed

For each node in the list

If the node still needs to be hydrated, mark it as "NonResident"

This assumes we can build some interface like this for the flash, that the storage worker can use:

let mut buf = [0u8; MAX_PAGE_LEN]; // returns an iterator of addr+len pairs we can read from the flash // this would be just one if the total data is <4K, or more if the // total data is >4k let page_iter = flash.get_latest(); for (page_addr, page_len) in page_iter { let chunk = flash.read(page_addr, &mut buf[..page_len]).await?; let item_iter = deser_kvs(chunk); for (key, value) in item_iter { // key: &str // value: &[u8] // do the stuff with iterating the nodes to find if we have this } }

JuliDi · 2025-05-09T09:07:04Z

src/intrusive.rs

+                        // already, we might want to try again later. If it failed for some
+                        // other reason, or it fails even with an empty page (e.g. serializes
+                        // to more than 1k or 4k or something), there's really not much to be
+                        // done, other than log.


Since serialization is deterministic, we should be able to have some way of catching this during compilation.

maaaaaybe. Remember the max size is different for every type, and we don't have dynamic memory allocation or anything like alloca. We should probably just set some reasonable max size like 1k for each individual item.

JuliDi · 2025-05-09T09:19:11Z

src/intrusive.rs

+                    // Make a node pointer from a header pointer. This *consumes* the `Pin<&mut NodeHeader>`, meaning
+                    // we are free to later re-invent other mutable ptrs/refs, AS LONG AS we still treat the data
+                    // as pinned.
+                    let mut hdrptr: NonNull<NodeHeader> =
+                        NonNull::from(unsafe { Pin::into_inner_unchecked(node) });
+                    let nodeptr: NonNull<Node<()>> = hdrptr.cast();


This makes sense to me from a (C) pointer perspective, but I don't yet understand the peculiarities of the Pin (and how we ensure to "treat the data as pinned") and the cast.

My understanding of what it does is the Rusty way of Node* nodeptr = reinterpret_cast<Node*>(header_ptr) .

Does this get "easier" when the nodes are 'static? Or do we still need the Pin then?

The Pin is an artifact of cordyceps::List's interface. Pin guarantees we DO NOT move out of the pinned reference, e.g. we don't swap out an old Node for a new Node, because Pin says things are allowed to hold pointers to items, and doing that would invalidate them.

We don't do any of that, so mostly it should be fine as-is. If you need to tag me to check the safety of something, let me know and I can take a quick peek.

JuliDi · 2025-05-09T09:20:43Z

src/intrusive.rs

+                    if let Some(val) = in_flash.get(key) {
+                        // It seems so! Try to deserialize it, using the type-erased vtable
+                        //
+                        // Note: We MUST have destroyed `node` at this point, so we don't have aliasing


Is this what happens with the into_inner_unchecked?

Yes, more specifically NonNull::new() consumes the &mut ptr we get out of into_inner_unchecked.

JuliDi · 2025-05-09T09:25:13Z

src/intrusive.rs

+
+                        // SAFETY: We can re-magic a reference to the header, because the NonNull<Node<()>>
+                        // does not have a live reference anymore
+                        let hdrmut = unsafe { hdrptr.as_mut() };


Maybe add this to the docs later:

This works because we passed nodeptr to deserialize and hdrptr is actually Copy so it was not moved with the above call to cast.

yes, pointers can be copied, the important thing for safety is that we never have aliasing mutable REFERENCES live at the same time. We can have "duped" pointers, but we can't turn those into references (or ACCESS through the pointers) if there are other references live.

References have stricter rules than pointers: References assert "no aliased mutable access" for THE WHOLE TIME they exist, while pointers only assert that while you are accessing through them.

I say "assert": it's not something that will panic, it is just something that is undefined behavior. You might catch cases of this with miri testing.

JuliDi · 2025-05-09T10:55:53Z

src/intrusive.rs

+
+                    if let Ok(used) = res {
+                        // "Store" in our "flash"
+                        let res = flash.insert(key.to_string(), (buf[..used]).to_vec());


How will this be handled with actual flash? We won't be writing immediately but collect multiple nodes into a single page to avoid erasing entire pages, right?
Do we end up with a two-step process here, where we first insert into a "buffer to be written" and then"write buffer to flash"? Or is this entirely handled by the "flash driver" (that implements a Flash trait?)

See some previous comments, this is still to be solved.

JuliDi · 2025-05-09T11:53:29Z

src/intrusive.rs

+            list: AtomicPtr::new(ptr::null_mut()),
+            state_change: WaitCell::new(),
+            inner: UnsafeCell::new(Node {
+                header: NodeHeader {


Consider providing a Default impl for NodeHeader

JuliDi · 2025-05-09T12:00:57Z

src/intrusive.rs

+
+    // Attaches, waits for hydration. If value not in flash, a default value is used
+    pub async fn attach(&'static self, list: &'static PinList<R>) -> Result<(), ()> {
+        // TODO: We need some kind of "taken" flag to prevent multiple


https://github.com/embassy-rs/embassy/blob/64a2b9b2a36adbc26117b8e76ae2b178a5e3eb9f/embassy-hal-internal/src/macros.rs#L58-L70 is the approach with a taken flag.
Returning a handle or having some sort of newtype/builder pattern might also be an option.

However, properly documenting that it will panic if called twice and implementing the flag approach is easier and works quite well in embassy.

On second thought: couldn't we check whether self.list != null and return (since it has already been attached)?

Yes, if we do a compare and swap at the top instead of just an is_null check, right now I don't "set" the pointer until the end of the function, which means two attachs could happen concurrently.

I think switching to a handle is a better idea because it solves a lot of other API problems I've commented on.

JuliDi · 2025-05-09T12:12:56Z

src/intrusive.rs

+
+        // now spin until we have a value, or we know it is non-resident
+        // This is like a nicer version of `poll_fn`.
+        self.state_change


How do we ensure that a process_read is triggered so we don't wait forever?

"it is up to the storage writer to ensure that reads are processed regularly".

I don't think we magically can, but imo it's reasonable to say "you must await the "reads needed" future, or just make sure you check regularly enough.

JuliDi · 2025-05-09T12:14:42Z

src/intrusive.rs

+                        State::Initial => false,
+                        State::NonResident => {
+                            // We are nonresident, we need to initialize
+                            noderef.t = MaybeUninit::new(T::default());


Consider making this a separate function on t (init_with_default or so), such that the State is not managed by different parties

I'm not sure what you mean, State definitely has to be managed by two parties, the flash worker (sets state on initial load, and when writes are processed), and the node itself (when a write occurs and it marks "needs write").

My note was more geared towards development: how can we logically group (or confine?) the places where the node's state is manipulated so refactoring is less error prone (or how can I make it harder to forget setting the right state, especially if we change the state enum).

jamesmunns added 4 commits May 8, 2025 15:02

WIP

5c3f0a6

A smarter way to type pun

3929dfc

Add even more docs

2bbb695

Cargo fmt

3b19146

JuliDi reviewed May 9, 2025

View reviewed changes

Apply suggestions from code review

eebd7c5

Co-authored-by: Julian <[email protected]>

JuliDi reviewed May 9, 2025

View reviewed changes

This was referenced May 10, 2025

Implement waking with WaitQueue #2

Closed

Setup automatic miri checks #3

Closed

Prevent calling attach function more than once #7

Closed

JuliDi merged commit eebd7c5 into main May 20, 2025

WIP: Intrusive #1

WIP: Intrusive #1

Uh oh!

Conversation

jamesmunns commented May 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!