-
Notifications
You must be signed in to change notification settings - Fork 1
WIP: Intrusive #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
T: 'static, | ||
R: ScopedRawMutex + 'static, | ||
{ | ||
/// morally: Option<&'static PinList<R>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that? Should we just use the Option<>
?
&'static
instead of the AtomicPtr
would also seem more ergonomic if we require it to be 'static
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used an AtomicPtr
because this is in the shared "read only" part of the storage. We'd need to use an UnsafeCell<Option<&'static PinList<R>>>
(and ensure we have the lock locked) to write this in a shared static item.
We might be able to get rid of this entirely tho, see the notes about PinListNodeHandle
.
// <https://github.com/rust-lang/rust/pull/82834> | ||
// TODO: I don't think this heuristic is necessarily true anymore? We | ||
// should check this. | ||
_pin: PhantomPinned, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't understand whether or not this is resolved now. Doesn't hurt to keep it in there for now, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rust-lang/rust#63818 (comment) is what I saw at some point, definitely does not hurt to keep, I'd leave it and we can investigate more later.
/// | ||
/// ## State transition diagram | ||
/// | ||
/// ```text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What tool did you use to draw these? I was thinking about doing one for the entire list, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used MonoDraw for MacOS
/// The "key" of our "key:value" store. | ||
/// | ||
/// TODO: We should probably enforce that within the linked list, all | ||
/// node `key`s are unique! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be sufficient to only check this when a new key is added? Once we have the list read from flash it is somewhat quick to traverse and check for the presence of key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think when we do attach
, we should iterate over the full list before adding the new node to the list, checking if there are any duplicate keys, and failing if any node has the same key as the potential new node.
/// If we actually only ever need this in `attach`, maybe we just store | ||
/// an Option<Waker> here instead? idk, there's maybe a better way than | ||
/// this to do things. | ||
state_change: Option<&'static WaitCell>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to manage this on a per-node basis or could we have this handled between the flash worker and the node owner?
(This is more of a note to myself, I will need to look into how this is done in other implementations. Maybe the owner can pass a waker to the flash task or so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe another thing we could do is put a single WaitQueue
in the PinList
(and remove all the node WaitCells), and when a node is waiting for hydration, it waits on that WaitQueue, and when the worker has finished processing reads, it can do a wake_all
. WaitQueue
also supports wait_for, so that would likely be way easier. This could be a good first follow-up PR to do if you have time today!
/// | ||
/// TODO: this state needs some more design. Especially how we mark the state | ||
/// if the owner of the node writes ANOTHER new value while we are waiting | ||
/// for the flush to "stick". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a different state as marking? Or is this more of a state transition back to NeedsWrite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we may end up needing to use a bitfield or something, because a node could potentially simultaneously be "NeedsWriting" AND "WriteStarted".
For example, we could have a process like:
- The node has no writes pending, it is in
ValidNoWriteNeeded
- The node is written to. It moves to
NeedsWrite
- The flash worker starts the write, moves the node to
WriteStarted
- The flash worker lets go of the mutex and
await
s the flash to be written - The node is written to again with NEW data.
- If we overwrite
WriteStarted
toNeedsWrite
, how to we make sure the flash worker doesn't move it back toValidNoWriteNeeded
before it does ANOTHER write?
- If we overwrite
Co-authored-by: Julian <[email protected]>
/// 1. Locks the mutex | ||
/// 2. Iterates through each node currently linked in the list | ||
/// 3. For Each node that says "I need to be hydrated", it sees if a matching | ||
/// key record exists in the hashmap (our stand-in for the actual flash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be implemented with real flash? I assume we would be reading everything from the flash and keep track of all keys and their locations in a HashMap to avoid reading from flash for every key? Might also depend on how slow the reads are and whether that's acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, "it depends" on how we implement the flash data structure. If we have some way of loading, say, up to 4K at a time, we could do something like:
- For each <=4K chunk of flash in the "latest" window
- Read the chunk to RAM w/async+dma read
- For each node in the list
- If the node "needs to be hydrated":
- For each K:V pair in the flash chunk
- if the key from the chunk item matches the key from the node
- then attempt to deser the value into the node
- On success, mark it "Valid"
- On failure, mark it "NonResident"
- When all chunks from the "latest" flash window have been processed
- For each node in the list
- If the node still needs to be hydrated, mark it as "NonResident"
- For each node in the list
This assumes we can build some interface like this for the flash, that the storage worker can use:
let mut buf = [0u8; MAX_PAGE_LEN];
// returns an iterator of addr+len pairs we can read from the flash
// this would be just one if the total data is <4K, or more if the
// total data is >4k
let page_iter = flash.get_latest();
for (page_addr, page_len) in page_iter {
let chunk = flash.read(page_addr, &mut buf[..page_len]).await?;
let item_iter = deser_kvs(chunk);
for (key, value) in item_iter {
// key: &str
// value: &[u8]
// do the stuff with iterating the nodes to find if we have this
}
}
// already, we might want to try again later. If it failed for some | ||
// other reason, or it fails even with an empty page (e.g. serializes | ||
// to more than 1k or 4k or something), there's really not much to be | ||
// done, other than log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since serialization is deterministic, we should be able to have some way of catching this during compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maaaaaybe. Remember the max size is different for every type, and we don't have dynamic memory allocation or anything like alloca
. We should probably just set some reasonable max size like 1k for each individual item.
// Make a node pointer from a header pointer. This *consumes* the `Pin<&mut NodeHeader>`, meaning | ||
// we are free to later re-invent other mutable ptrs/refs, AS LONG AS we still treat the data | ||
// as pinned. | ||
let mut hdrptr: NonNull<NodeHeader> = | ||
NonNull::from(unsafe { Pin::into_inner_unchecked(node) }); | ||
let nodeptr: NonNull<Node<()>> = hdrptr.cast(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me from a (C) pointer perspective, but I don't yet understand the peculiarities of the Pin (and how we ensure to "treat the data as pinned") and the cast.
My understanding of what it does is the Rusty way of Node* nodeptr = reinterpret_cast<Node*>(header_ptr)
.
Does this get "easier" when the nodes are 'static
? Or do we still need the Pin then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Pin
is an artifact of cordyceps::List
's interface. Pin guarantees we DO NOT move out of the pinned reference, e.g. we don't swap out an old Node
for a new Node
, because Pin
says things are allowed to hold pointers to items, and doing that would invalidate them.
We don't do any of that, so mostly it should be fine as-is. If you need to tag me to check the safety of something, let me know and I can take a quick peek.
if let Some(val) = in_flash.get(key) { | ||
// It seems so! Try to deserialize it, using the type-erased vtable | ||
// | ||
// Note: We MUST have destroyed `node` at this point, so we don't have aliasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what happens with the into_inner_unchecked
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, more specifically NonNull::new()
consumes the &mut
ptr we get out of into_inner_unchecked
.
|
||
// SAFETY: We can re-magic a reference to the header, because the NonNull<Node<()>> | ||
// does not have a live reference anymore | ||
let hdrmut = unsafe { hdrptr.as_mut() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add this to the docs later:
This works because we passed nodeptr
to deserialize and hdrptr
is actually Copy
so it was not moved with the above call to cast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, pointers can be copied, the important thing for safety is that we never have aliasing mutable REFERENCES live at the same time. We can have "duped" pointers, but we can't turn those into references (or ACCESS through the pointers) if there are other references live.
References have stricter rules than pointers: References assert "no aliased mutable access" for THE WHOLE TIME they exist, while pointers only assert that while you are accessing through them.
I say "assert": it's not something that will panic, it is just something that is undefined behavior. You might catch cases of this with miri
testing.
|
||
if let Ok(used) = res { | ||
// "Store" in our "flash" | ||
let res = flash.insert(key.to_string(), (buf[..used]).to_vec()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be handled with actual flash? We won't be writing immediately but collect multiple nodes into a single page to avoid erasing entire pages, right?
Do we end up with a two-step process here, where we first insert into a "buffer to be written" and then"write buffer to flash"? Or is this entirely handled by the "flash driver" (that implements a Flash trait?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See some previous comments, this is still to be solved.
list: AtomicPtr::new(ptr::null_mut()), | ||
state_change: WaitCell::new(), | ||
inner: UnsafeCell::new(Node { | ||
header: NodeHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider providing a Default impl for NodeHeader
|
||
// Attaches, waits for hydration. If value not in flash, a default value is used | ||
pub async fn attach(&'static self, list: &'static PinList<R>) -> Result<(), ()> { | ||
// TODO: We need some kind of "taken" flag to prevent multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/embassy-rs/embassy/blob/64a2b9b2a36adbc26117b8e76ae2b178a5e3eb9f/embassy-hal-internal/src/macros.rs#L58-L70 is the approach with a taken flag.
Returning a handle or having some sort of newtype/builder pattern might also be an option.
However, properly documenting that it will panic if called twice and implementing the flag approach is easier and works quite well in embassy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought: couldn't we check whether self.list != null
and return (since it has already been attached)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we do a compare and swap
at the top instead of just an is_null
check, right now I don't "set" the pointer until the end of the function, which means two attach
s could happen concurrently.
I think switching to a handle is a better idea because it solves a lot of other API problems I've commented on.
|
||
// now spin until we have a value, or we know it is non-resident | ||
// This is like a nicer version of `poll_fn`. | ||
self.state_change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure that a process_read is triggered so we don't wait forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it is up to the storage writer to ensure that reads are processed regularly".
I don't think we magically can, but imo it's reasonable to say "you must await the "reads needed" future, or just make sure you check regularly enough.
State::Initial => false, | ||
State::NonResident => { | ||
// We are nonresident, we need to initialize | ||
noderef.t = MaybeUninit::new(T::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this a separate function on t
(init_with_default or so), such that the State
is not managed by different parties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean, State
definitely has to be managed by two parties, the flash worker (sets state on initial load, and when writes are processed), and the node itself (when a write occurs and it marks "needs write").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My note was more geared towards development: how can we logically group (or confine?) the places where the node's state is manipulated so refactoring is less error prone (or how can I make it harder to forget setting the right state, especially if we change the state enum).
No description provided.