Skip to content

WIP: Intrusive #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 20, 2025
Merged

WIP: Intrusive #1

merged 5 commits into from
May 20, 2025

Conversation

jamesmunns
Copy link
Collaborator

No description provided.

T: 'static,
R: ScopedRawMutex + 'static,
{
/// morally: Option<&'static PinList<R>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that? Should we just use the Option<>?
&'static instead of the AtomicPtr would also seem more ergonomic if we require it to be 'static anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an AtomicPtr because this is in the shared "read only" part of the storage. We'd need to use an UnsafeCell<Option<&'static PinList<R>>> (and ensure we have the lock locked) to write this in a shared static item.

We might be able to get rid of this entirely tho, see the notes about PinListNodeHandle.

// <https://github.com/rust-lang/rust/pull/82834>
// TODO: I don't think this heuristic is necessarily true anymore? We
// should check this.
_pin: PhantomPinned,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't understand whether or not this is resolved now. Doesn't hurt to keep it in there for now, does it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rust-lang/rust#63818 (comment) is what I saw at some point, definitely does not hurt to keep, I'd leave it and we can investigate more later.

///
/// ## State transition diagram
///
/// ```text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tool did you use to draw these? I was thinking about doing one for the entire list, as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used MonoDraw for MacOS

/// The "key" of our "key:value" store.
///
/// TODO: We should probably enforce that within the linked list, all
/// node `key`s are unique!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to only check this when a new key is added? Once we have the list read from flash it is somewhat quick to traverse and check for the presence of key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think when we do attach, we should iterate over the full list before adding the new node to the list, checking if there are any duplicate keys, and failing if any node has the same key as the potential new node.

/// If we actually only ever need this in `attach`, maybe we just store
/// an Option<Waker> here instead? idk, there's maybe a better way than
/// this to do things.
state_change: Option<&'static WaitCell>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to manage this on a per-node basis or could we have this handled between the flash worker and the node owner?
(This is more of a note to myself, I will need to look into how this is done in other implementations. Maybe the owner can pass a waker to the flash task or so)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe another thing we could do is put a single WaitQueue in the PinList (and remove all the node WaitCells), and when a node is waiting for hydration, it waits on that WaitQueue, and when the worker has finished processing reads, it can do a wake_all. WaitQueue also supports wait_for, so that would likely be way easier. This could be a good first follow-up PR to do if you have time today!

///
/// TODO: this state needs some more design. Especially how we mark the state
/// if the owner of the node writes ANOTHER new value while we are waiting
/// for the flush to "stick".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a different state as marking? Or is this more of a state transition back to NeedsWrite?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we may end up needing to use a bitfield or something, because a node could potentially simultaneously be "NeedsWriting" AND "WriteStarted".

For example, we could have a process like:

  1. The node has no writes pending, it is in ValidNoWriteNeeded
  2. The node is written to. It moves to NeedsWrite
  3. The flash worker starts the write, moves the node to WriteStarted
  4. The flash worker lets go of the mutex and awaits the flash to be written
  5. The node is written to again with NEW data.
    • If we overwrite WriteStarted to NeedsWrite, how to we make sure the flash worker doesn't move it back to ValidNoWriteNeeded before it does ANOTHER write?

/// 1. Locks the mutex
/// 2. Iterates through each node currently linked in the list
/// 3. For Each node that says "I need to be hydrated", it sees if a matching
/// key record exists in the hashmap (our stand-in for the actual flash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this be implemented with real flash? I assume we would be reading everything from the flash and keep track of all keys and their locations in a HashMap to avoid reading from flash for every key? Might also depend on how slow the reads are and whether that's acceptable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, "it depends" on how we implement the flash data structure. If we have some way of loading, say, up to 4K at a time, we could do something like:

  • For each <=4K chunk of flash in the "latest" window
    • Read the chunk to RAM w/async+dma read
    • For each node in the list
      • If the node "needs to be hydrated":
      • For each K:V pair in the flash chunk
        • if the key from the chunk item matches the key from the node
        • then attempt to deser the value into the node
        • On success, mark it "Valid"
        • On failure, mark it "NonResident"
  • When all chunks from the "latest" flash window have been processed
    • For each node in the list
      • If the node still needs to be hydrated, mark it as "NonResident"

This assumes we can build some interface like this for the flash, that the storage worker can use:

let mut buf = [0u8; MAX_PAGE_LEN];
// returns an iterator of addr+len pairs we can read from the flash
// this would be just one if the total data is <4K, or more if the
// total data is >4k
let page_iter = flash.get_latest();
for (page_addr, page_len) in page_iter {
    let chunk = flash.read(page_addr, &mut buf[..page_len]).await?;
    let item_iter = deser_kvs(chunk);
    for (key, value) in item_iter {
        // key: &str
        // value: &[u8]
        // do the stuff with iterating the nodes to find if we have this
    }
}

// already, we might want to try again later. If it failed for some
// other reason, or it fails even with an empty page (e.g. serializes
// to more than 1k or 4k or something), there's really not much to be
// done, other than log.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since serialization is deterministic, we should be able to have some way of catching this during compilation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maaaaaybe. Remember the max size is different for every type, and we don't have dynamic memory allocation or anything like alloca. We should probably just set some reasonable max size like 1k for each individual item.

Comment on lines +338 to +343
// Make a node pointer from a header pointer. This *consumes* the `Pin<&mut NodeHeader>`, meaning
// we are free to later re-invent other mutable ptrs/refs, AS LONG AS we still treat the data
// as pinned.
let mut hdrptr: NonNull<NodeHeader> =
NonNull::from(unsafe { Pin::into_inner_unchecked(node) });
let nodeptr: NonNull<Node<()>> = hdrptr.cast();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me from a (C) pointer perspective, but I don't yet understand the peculiarities of the Pin (and how we ensure to "treat the data as pinned") and the cast.

My understanding of what it does is the Rusty way of Node* nodeptr = reinterpret_cast<Node*>(header_ptr) .

Does this get "easier" when the nodes are 'static? Or do we still need the Pin then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pin is an artifact of cordyceps::List's interface. Pin guarantees we DO NOT move out of the pinned reference, e.g. we don't swap out an old Node for a new Node, because Pin says things are allowed to hold pointers to items, and doing that would invalidate them.

We don't do any of that, so mostly it should be fine as-is. If you need to tag me to check the safety of something, let me know and I can take a quick peek.

if let Some(val) = in_flash.get(key) {
// It seems so! Try to deserialize it, using the type-erased vtable
//
// Note: We MUST have destroyed `node` at this point, so we don't have aliasing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what happens with the into_inner_unchecked?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, more specifically NonNull::new() consumes the &mut ptr we get out of into_inner_unchecked.


// SAFETY: We can re-magic a reference to the header, because the NonNull<Node<()>>
// does not have a live reference anymore
let hdrmut = unsafe { hdrptr.as_mut() };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this to the docs later:

This works because we passed nodeptr to deserialize and hdrptr is actually Copy so it was not moved with the above call to cast.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, pointers can be copied, the important thing for safety is that we never have aliasing mutable REFERENCES live at the same time. We can have "duped" pointers, but we can't turn those into references (or ACCESS through the pointers) if there are other references live.

References have stricter rules than pointers: References assert "no aliased mutable access" for THE WHOLE TIME they exist, while pointers only assert that while you are accessing through them.

I say "assert": it's not something that will panic, it is just something that is undefined behavior. You might catch cases of this with miri testing.


if let Ok(used) = res {
// "Store" in our "flash"
let res = flash.insert(key.to_string(), (buf[..used]).to_vec());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this be handled with actual flash? We won't be writing immediately but collect multiple nodes into a single page to avoid erasing entire pages, right?
Do we end up with a two-step process here, where we first insert into a "buffer to be written" and then"write buffer to flash"? Or is this entirely handled by the "flash driver" (that implements a Flash trait?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some previous comments, this is still to be solved.

list: AtomicPtr::new(ptr::null_mut()),
state_change: WaitCell::new(),
inner: UnsafeCell::new(Node {
header: NodeHeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider providing a Default impl for NodeHeader


// Attaches, waits for hydration. If value not in flash, a default value is used
pub async fn attach(&'static self, list: &'static PinList<R>) -> Result<(), ()> {
// TODO: We need some kind of "taken" flag to prevent multiple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/embassy-rs/embassy/blob/64a2b9b2a36adbc26117b8e76ae2b178a5e3eb9f/embassy-hal-internal/src/macros.rs#L58-L70 is the approach with a taken flag.
Returning a handle or having some sort of newtype/builder pattern might also be an option.

However, properly documenting that it will panic if called twice and implementing the flag approach is easier and works quite well in embassy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought: couldn't we check whether self.list != null and return (since it has already been attached)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we do a compare and swap at the top instead of just an is_null check, right now I don't "set" the pointer until the end of the function, which means two attachs could happen concurrently.

I think switching to a handle is a better idea because it solves a lot of other API problems I've commented on.


// now spin until we have a value, or we know it is non-resident
// This is like a nicer version of `poll_fn`.
self.state_change
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure that a process_read is triggered so we don't wait forever?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it is up to the storage writer to ensure that reads are processed regularly".

I don't think we magically can, but imo it's reasonable to say "you must await the "reads needed" future, or just make sure you check regularly enough.

State::Initial => false,
State::NonResident => {
// We are nonresident, we need to initialize
noderef.t = MaybeUninit::new(T::default());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this a separate function on t (init_with_default or so), such that the State is not managed by different parties

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean, State definitely has to be managed by two parties, the flash worker (sets state on initial load, and when writes are processed), and the node itself (when a write occurs and it marks "needs write").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My note was more geared towards development: how can we logically group (or confine?) the places where the node's state is manipulated so refactoring is less error prone (or how can I make it harder to forget setting the right state, especially if we change the state enum).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants