-
Notifications
You must be signed in to change notification settings - Fork 517
Add custom comparators #1182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add custom comparators #1182
Conversation
|
cc @al8n |
|
Restored the |
4e63330 to
b19ede6
Compare
Renamed OrdComparator to BasicComparator and made compatible with the Equivalent/Comparable traits
b19ede6 to
fbfe7fd
Compare
|
It's been a while since I circled back to this, but is there a sentiment as to whether to merge? |
agavra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, though I admit I'm a little sketched out by the concept of breaking the hash/eq dependency definition with Equivalator trait...
Would be good to get some committers reviewing this as well! Excited for this feature.
crossbeam-skiplist/src/comparator.rs
Outdated
| /// Because you can use `Equivalator` to implement very different comparison | ||
| /// logic from the `Eq` trait for a given type, there is no expectation that | ||
| /// two keys should have the same hash for them to compare equal. For example, | ||
| /// this is a valid implementation that treats two inputs of any type as equal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to spell out the motivations for this. It is always possible for the user to implement some funky transform of their keys before they insert into a set/map and when retrieving data as a workaround for not having this. (e.g. in your test, I think a much better implementation if the user wanted that is to just set.insert(key.length()) instead of doing something convoluted with the eq function). That being said, you could still have a comparator that sorts based on key.length and considers two keys equivalent for a sort order if they have the same length.
On the other hand, if we introduce this, it would be impossible to leverage traditional hash structures as optimizations in the future (e.g. add a bloom filter on top of a skiplist) because two elements could be equivalent under this equivilator but have different hashes.
It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to spell out the motivations for this.
Agreed. My example was pretty contrived; I can provide more realistic motivation.
Though I will remark that set.insert(key.length()) would produce a totally different program as you'd lose access to the first key inserted. Even skip_map.insert(key.length(), key) is not equivalent since that would give you the last key inserted instead of the first.
It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.
Well, this doesn't "break" the Hash/Eq dependency since nothing changes if you stick to the standard library traits. But I think the documentation here may be confusing because hashing was out of scope for this PR, so there is no indication of how you would handle hashing when you need it. In reality, hashing is very easy to accommodate; it just requires another "Hash"-ator trait to go alongside Comparator and Equivalator. This is another concept that has been in the C++ standard library for decades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! If there's a good reference implementation in the standard C++ library that gives me a lot more confidence in the approach. Thanks for the explanation.
| } | ||
|
|
||
| /// Returns an iterator over all entries in the map, | ||
| /// sorted by key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// sorted by key. | |
| /// sorted by the comparator's ordering of the key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also update SkipSet and change all method docs to use the term "equivalent" (instead of equal).
| handle.join().unwrap() | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably makes sense to add tests for SkipMap as well
Addressing feedback on the PR. - Clarify the lack of an Equivalator-compatible hashing trait. - More prominently mention comparators in the SkipList/SkipMap docs. - Updated tests and example to be more realistic and less constrived.
b1258bc to
ec87e01
Compare
|
Sorry it took me an eternity to circle back round to this. I got busy and I'm bad at multitasking. I pushed a commit that addresses the last round of feedback with respect to comments and tests/examples. Are there any maintainers that want to provide feedback and give a disposition as to whether there is a desire to merge this? Or whether we should perhaps take a different approach, such as spinning out the traits into a new crate and adding a hashing trait for DB folks who want to support this with their Bloom filters? |
Closes #1180.
The diff is large due to having to add a type parameter
Cand update constraints in numerous places, but the logic changes are not large.This trait adds the
ComparatorandEquivalatortraits, which generalize theComparableandEquivalenttraits by taking aselfparameter in addition to the two keys to compare, making it possible to change the comparison function used without changing the key type.Summary
trait Comparator<L, R>andtrait Equivalator<L, R>.struct BasicComparator, which is a zero-sized type that implementsComparatorusing theComparabletrait, which itself falls back onBorrowandOrd.CtoSkipList,SkipMap, andSkipSet.Cis defaulted toOrdComparator. This preserves the existing behavior of functions likeget(),insert(), etc.SkipList::with_comparator(),SkipSet::with_comparator(), andSkipMap::with_comparator()constructors.SkipList::new(),SkipMap::new(), andSkipSet::new()always useC = OrdComparator. It would be nice if it worked for anyCthat implementsDefault, but that would break type inference. This is exactly the reasonstd::collections::HashMap::new()always usesS = RandomStatefor its defaulted parameter even ifSimplementsDefault.Box<dyn Fn(&[u8], &[u8]) -> Ordering>as the comparator.Hashing
Because the goal of these traits is to support totally different comparison operations from the default for the underlying key type,
Equivalatorexplicitly does not require two equivalent keys to have the same hash. If someone, someday wants to supportEquivalatoron their hash map, they will have to generalize theHashtrait as well.Safety
My one minor question is whether
unsafe impl<Q, R, K, V, C> Send for RefRange<'_, Q, R, K, V, C>should requireC: Sync. I don't think so because it doesn't requireKorVto be sync, but it was enough to make me second-guess myself.