Skip to content

Closes out all key_view related work.#845

Open
thompsonbry wants to merge 128 commits intounodb-dev:masterfrom
thompsonbry:fix/key-view-full-inode-path-v2
Open

Closes out all key_view related work.#845
thompsonbry wants to merge 128 commits intounodb-dev:masterfrom
thompsonbry:fix/key-view-full-inode-path-v2

Conversation

@thompsonbry
Copy link
Copy Markdown
Contributor

@thompsonbry thompsonbry commented Mar 19, 2026

This PR closes out the planned work for key_view trees:

  • Full key in inode path: key_view trees encode the complete key in the inode prefix+dispatch chain, enabling keyless leaves and packed values.
  • Value-in-slot (VIS): db<key_view, uint64_t> stores small values directly in inode child slots, eliminating leaf heap allocations. Supported across db, mutex_db, and olc_db.
  • key_view benchmarks: insert/get/scan/remove micro-benchmarks for both <key_view, value_view> and <key_view, uint64_t>.
  • get_internal optimization: high-bit check on child pointer replaces is_value_in_slot switch dispatch for packed value detection.

Closes #841 (benchmarks), #717 (secondary index use case with u64 values), #707 (eliminates leaves entirely for key_view keys and small values packing them into the inode slots), #644 (scan kernel optimization for key_view i48), #612 (full key must be in the inode path for key_view, enabling leaf elimination for the common secondary index use case)

Summary by CodeRabbit

  • New Features

    • Support key_view keys with non-value_view value types and a new transient_key_view type.
    • Optional packed “value-in-slot” storage to avoid leaf allocation when enabled.
  • Bug Fixes & Improvements

    • Iterator and lookup return types and key comparisons now adapt to configured key/value policies.
    • Insertion, traversal and removal behaviors improved for packed-value and full-key-in-inode modes.
  • Tests

    • Large new full-chain key-view test suite and additional concurrency regression tests.
  • Benchmarks

    • New microbenchmarks and benchmark target for key_view workloads.

Add micro_benchmark_key_view with benchmarks for key_view ART tree
operations: insert, get, remove, and scan.  Five key generators
exercise different chain structures:

- G1 compound (9B, 1-level chain)
- G2 deep (18B, 2-level chain)
- G4 multi_tag (9B, 8 independent subtrees)
- G5 dense (8B, no chains — baseline vs u64)
- G6 chain_depth (8-256B, variable chain depth)

Includes kv_vs_u64 comparison benchmarks with 100-byte values matching
the existing u64 dense_insert benchmark structure.  Timing discipline
(PauseTiming/destroy_tree) is identical to the u64 benchmarks.

Adds key_view_set utility class and kv_db/kv_mutex_db/kv_olc_db type
aliases to micro_benchmark_utils.hpp.
…ction

Add static constexpr bool value_in_slot to basic_art_policy, true when
sizeof(Value) <= sizeof(uint64_t). Add static_assert that uintptr_t
fits in uint64_t (slot size).

Remove the three static_asserts that enforced Value == value_view in
db, olc_db, and mutex_db. These were marked as temporary during
development. No behavioral change — all existing code uses value_view.

Part of unodb-dev#707.
Change get_result from std::optional<value_view> to
std::optional<value_type> in db and olc_db. Change mutex_db to
reference db<Key, Value> instead of db<Key, value_view>.

No behavioral change — Value is still value_view in all existing code.
Enables future use with Value=uint64_t for value-in-slot.

Part of unodb-dev#707.
- Generalize tree_verifier: value_type instead of hardcoded value_view
- Add get_test_value<Db>() helper for type-appropriate test values
- Add key_view_u64val_db/mutex_db/olc_db type aliases
- Add test_art_key_view_full_chain.cpp (Mode 3 test suite)
- Wire into CMakeLists.txt

Does not compile yet — db/leaf do not support uint64_t values.
- Leaf: add get_value<Value>() template for type-safe value retrieval
- make_db_leaf_ptr: accept Value type, serialize to bytes internally
- art.hpp/olc_art.hpp: insert and get paths use Value type throughout
- add_or_choose_subtree: Value instead of value_view in signatures
- Iterator get_val(): returns value_type (or qsbr_value_view for spans)
- tree_verifier: generalized for non-value_view value types
- New type aliases: key_view_u64val_db/mutex_db/olc_db
- New test suite: test_art_key_view_full_chain (key_view + uint64_t)
- All 11 test suites pass
Verify that a zero-length key_view works correctly for insert, get,
and remove across all three db types (db, mutex_db, olc_db).
Includes stats assertions for node counts.
Add sync point sync_before_remove_write_guard in OLC
remove_or_choose_subtree, firing after leaf match confirmed and
min_size read, before write guard acquisition.

Three tests (RT1-RT3) verify OLC version checks detect concurrent
modifications at this point:
- RT1: concurrent insert into same inode (above min_size)
- RT2: concurrent remove from min-size I4 (both children removed)
- RT3: concurrent leaf replacement (remove + re-insert same key)

These validate the precondition for making matches() unconditional
for keyless key_view leaves in PR-D.
The db iterator's invalidate() resets keybuf_, but the OLC version
did not. This is harmless today since get_key() reads from the leaf,
but will matter when get_key() switches to keybuf_.get_key_view().
Replace single keybuf_ with keybuf_[2] + keybuf_ix_ in the OLC
iterator.  All existing keybuf_ references updated to use the
active buffer via keybuf_[keybuf_ix_].

This prepares for keyless key_view leaves where next()/prior()
must snapshot the current key before try_next()/try_prior() may
corrupt the buffer.  The index flip (keybuf_ix_ ^= 1) provides
O(1) switching with no key byte copy, supporting arbitrary key
lengths.

get_key() still reads from the leaf — switching to keybuf_ is
deferred until D1 (no root leaf) ensures keybuf_ is always
populated at every leaf.
Add 6 tests covering inode chain construction for key_view first
insert across key lengths 1-18 bytes:
- T1: single key insert/get/remove for each boundary length
- T2: two keys diverging at last byte
- T3: two keys diverging at first byte (intermediate chain depth)
- T4: forward scan over chain keys
- T5: chain + non-chain sibling with different first byte
- T6: prefix overflow on I4 collapse after chain removal

No stats assertions — correctness only.  All tests pass today
(root leaf path).  They will validate build_chain once implemented.
For key_view keys, the first insert into an empty tree creates an
inode chain encoding the full key in prefix+dispatch bytes, with the
leaf at the bottom.  Built bottom-up so the last I4 (with prefix from
key[0..]) becomes the root.  u64 keys unchanged via if constexpr.

New explicit-prefix-length I4 constructor for the tail chain level
where remaining key bytes <= key_prefix_capacity.

Guard prefix overflow in remove_or_choose_subtree: when collapsing a
min-size I4 would overflow the prefix (parent + dispatch + child >
key_prefix_capacity), remove the child entry instead of collapsing.
New can_collapse() method on inode_4 checks this precondition.
Add constexpr chain overhead (C) to stats assertions in test_art.cpp
and test_art_key_view.cpp.  C=1 for db/mutex_db key_view (build_chain
creates extra I4), C=0 for olc_db (build_chain not yet implemented)
and u64 types (no chain on first insert).

Fix account_shrinking_inode: move stats accounting inside the
collapse/no-collapse branches so the prefix-overflow path (which
removes a child without collapsing) does not incorrectly count a
shrink.

Add account_growing_inode calls to build_chain for each I4 created.
Same bottom-up chain construction as db::build_chain, under the
root_pointer_lock write guard.  Remove is_olc_db guards from stats
assertions — all key_view db types now create chains on first insert.
Add full_key_in_inode_path and can_eliminate_leaf constexprs to
basic_art_policy. Gate build_chain on can_eliminate_leaf instead of
is_same_v<Key, key_view>. Currently can_eliminate_leaf is false for
all existing db types (requires key_view + value_in_slot), so
build_chain is effectively disabled until PR-C provides a working
db<key_view, uint64_t>.

Revert test stats expectations to mainline values since build_chain
no longer runs for db<key_view, value_view>.
…ing (art.hpp only)

Option D implementation: after add_to_nonfull or growth, find the child
slot and replace the bare leaf with a chain encoding the remaining key
suffix.  Only in art.hpp (db/mutex_db) — OLC path not yet done.

build_chain now accepts start_depth parameter (default 0).

26 Mode 3 tests still failing — needs debugging + OLC path.
…gressions

S3 and S4 chain wrapping added but depth calculations appear wrong.
30 test failures (up from 24). New regressions in InsertDiverging,
RemoveFromChainLeavesInode, ScanChainMixedLengths.

Need to carefully trace depth semantics at each call site.
…nd olc_art.hpp)

Chain wrapping code is correct — all 10 non-Mode3 test suites pass.
33 Mode 3 test failures are all stats expectation mismatches (11 unique
tests × 3 db types). Test expectations need updating to match the
actual tree structure with full inode chain encoding.
Add start_depth parameter to build_chain (art.hpp + olc_art.hpp) so
chains can encode a key suffix starting at an arbitrary depth.

Wrap new leaves in chains at all insertion sites (S2-S4) when
full_key_in_inode_path is true:
- S2: add_or_choose_subtree (new child to existing inode)
- S3: leaf collision (two keys diverge at a leaf)
- S4: prefix mismatch (key diverges at inode prefix)

Option D approach: insert bare leaf first, then find the child slot
and replace with chain top. Under OLC write guard for thread safety.

Update Mode 3 test expectations: short keys (1 byte) have no suffix
to chain, so I4 count is lower than originally expected. Keys with
remaining suffix bytes get chain wrappers, increasing I4 count.

All 11 test suites pass (180 Mode 3 tests + 10 other suites).
…ose_subtree

Add UNODB_DETAIL_RELOAD macro to portability_builtins.hpp — forces
re-read through volatile pointer to prevent optimizer from caching
pre-mutation state across union type-pun (byte_array vs integer).

Use in add_or_choose_subtree: after add_to_nonfull writes to
byte_array, find_child reads via integer. Under clang -O3, the
optimizer assumes these don't alias and returns stale data.

FIXME(@laurynas-biveinis): remove UNODB_DETAIL_RELOAD when unodb-dev#700 is
fixed (replace all union type-punning with memcpy).
- push(iter_result): add missing return after push_leaf, preventing
  fallthrough to inode push (latent bug on mainline)
- pop(): fix off-by-one — pop prefix_len+1 for inodes (prefix + dispatch
  byte), 0 for leaves. Previously popped only prefix_len.
- Add test_only_stack() to both db::iterator and olc_db::iterator for
  stack structure validation in tests.
Verify that the iterator stack encodes the full key in the inode path
for can_eliminate_leaf trees. At each leaf position, the concatenation
of (prefix + dispatch_byte) across inode entries must equal the full
encoded key.

Tests: TwoChainKeys, WideNode, SecondInsertChain, FullScan (fwd+rev).
All run against key_view_u64val_{db,mutex_db,olc_db} (Mode 3 types).
Add scoped_key_view type in art_common.hpp — a non-owning key view
with scoped lifetime semantics, valid only until the next iterator
movement. Implicitly converts to key_view for read access.

Iterator get_key() return type is now conditional:
- full_key_in_inode_path trees: returns scoped_key_view (from keybuf_)
- other trees: returns key_view (from leaf, unchanged)

This makes the lifetime contract explicit in the type system. No
existing user code is affected since Mode 3 trees (key_view + small
value) are new and have no external callers yet.

Fix check_present_values in test verifier to copy the key when
comparing consecutive scan positions (required for scoped_key_view
since the underlying buffer is reused on iterator advance).
For Mode 3 trees, cmp() compares the to_key against keybuf_ instead
of reading from the leaf. Same pattern as get_key() (D5).
Mode 2 and u64 trees unchanged.
D2+D3 work in progress. Infrastructure compiles for non-keyless paths.
9 remaining errors in collision/split paths that call leaf->get_key_view()
on keyless leaves — these need key reconstruction from the inode path.

- art_common.hpp: no_key_tag, can_eliminate_key_in_leaf_v, leaf_key_type
- art_internal_impl.hpp: basic_leaf<no_key_tag, Header> specialization
- Deleter, unique_ptr, make_db_leaf_ptr updated for keyless leaves
- Scan/iterator cmp() guarded with if constexpr (category 2 done)
- OLC next()/prior() use keybuf_ for keyless leaves

NOT YET DONE: collision path key reconstruction (category 1)
- Remove useless static_cast<std::byte> on key_byte (already std::byte)
- Add cppcheck-suppress for memcpy on dead code branch (Value=span)
- Remove dead 'return false' after CANNOT_HAPPEN() in insert_internal_fixed
  (GCC 14 suggest-attribute=noreturn)
Add per-child-slot bitmask to each inode type for future value-in-slot
support. The bitmask indicates whether a child slot holds a value
(sizeof(Value) <= 8) or a child pointer.

- I4: 1 byte (uint8_t), fits in existing padding. No size change.
- I16: 2 bytes (uint16_t), fits in existing padding. No size change.
- I48: 6 bytes (array<uint8_t,6>), fits in existing padding. No size change.
- I256: 32 bytes (array<uint8_t,32>), children shift down. +32 bytes.

All bitmasks initialized to zero. Not yet used by any code path.

Part of unodb-dev#707.
- Replace test_values[N] with get_test_value<TypeParam>(N) in
  test_art_key_view.cpp (59 occurrences).
- Add key_view_u64val_db to type list (commented out, pending
  value-in-slot implementation).

Rebased onto feat/leaf-key-removal which already provides:
- Generalized tree_verifier (value_type, get_test_value)
- key_view_u64val_db/mutex/olc type aliases and extern templates
- Explicit db<key_view, uint64_t> instantiation
- build_chain infrastructure
- static_assert guards on make_db_leaf_ptr, reclaim_leaf_on_scope_exit
- if constexpr guards at all leaf creation/reclamation call sites
- pack_value/unpack_value helpers in art_policy
- value_bitmask accessors (is_value_in_slot, set/clear_value_bit) on all inodes
- add_two_to_empty(node_ptr) and add_to_nonfull(node_ptr) overloads on all inodes
- inode_4 constructor overload for value-in-slot child
- I16/I48/I256 grow constructor overloads (init_grow refactor)
- inode remove() methods guarded for can_eliminate_leaf

Compiles clean. Insert path works (values stored in slots).
Get/scan paths not yet updated — still dereference packed values as
pointers (causes segfault/assertion on traversal tests).

TODO: E3 (get path), E9 (optimize SIMD for value-in-slot)
- olc_art.hpp: add noexcept to two lambdas (C26440).
- micro_benchmark_key_view.cpp: const auto& for benchmark loop
  variable (C26496), all 11 instances.
- test_art_key_view_full_chain.cpp: suppress C26440 false positive
  on EmptyKeyRejected (test body throws by design).
- olc_art.hpp: add noexcept to two lambdas (C26440).
- art.hpp: suppress C26440 false positive on insert_internal_fixed
  (function allocates, can throw).
- art_internal_impl.hpp: suppress C26814 false positive on
  remaining_is_value in inode_4 remove.
- micro_benchmark_key_view.cpp: file-level suppress C4189/C26496
  for Google Benchmark loop variable pattern (auto _ : state).
- test_art_key_view_full_chain.cpp: add noexcept to kh::kv()
  (C26440); suppress C26440 on EmptyKeyRejected test body.
- olc_art.hpp: noexcept on db_leaf_qsbr_deleter::operator() (C26440).
- micro_benchmark_key_view.cpp: noexcept on scan lambdas (C26440).
- db_test_utils.hpp: suppress C26440 on assert_node_counts (uses
  gtest ASSERT macros that throw).
- test_art_key_view_full_chain.cpp: constexpr val, const k1-k4
  (C26814/C26496).
- olc_art.hpp: noexcept on db_inode_qsbr_deleter::operator() (C26440);
  fix create_leaf_if_needed — std::ignore instead of (void) casts
  (C26457), suppress C26440/C26411/C26460 (false positives from
  if constexpr branch where params are unused).
- art_internal_impl.hpp: suppress C26474 on explicit void* cast in
  pack_value (needed for portability across compilers).
- test_art_key_view_full_chain.cpp: constexpr val in
  StackStructureSecondInsertChain (C26814).
- art_internal_impl.hpp: suppress C26495 on flexible array member
  data[1] in keyless basic_leaf (intentionally uninitialized, filled
  by memcpy in constructor).
- olc_art.hpp: suppress C26415 on create_leaf_if_needed smart pointer
  param (false positive from if constexpr unused branch).
- test_art_key_view_full_chain.cpp: constexpr val in
  StackStructureWideNode (C26814).
- art_internal_impl.hpp: suppress C26495 on keyed leaf data[1]
  (flexible array member); const raw in unpack_value (C26496).
- test_art_key_view_full_chain.cpp: bulk const→constexpr for all
  65 get_test_value calls (C26814). Preemptive — avoids future
  rounds for the same pattern.
- art_internal_impl.hpp: move C26495 suppress to class level for
  keyless basic_leaf (member-level pragma didn't work on MSVC).
- test_art_key_view_full_chain.cpp: const k1/k2 (C26496).
- art_internal_impl.hpp: std::ignore instead of (void) cast (C26457);
  assert slot < 48 before uint8_t cast (C28020).
- olc_art.hpp: suppress C26460 on remove_or_choose_subtree
  parent_critical_section (false positive — param is mutated via
  try_read_unlock and std::move).
- test_art_key_view_full_chain.cpp: const tkv (C26496); suppress
  C26440 on scan lambda (gtest macros prevent noexcept).
- art_internal_impl.hpp: UNODB_DETAIL_ASSUME for insert_pos_index
  bounds in inode_16 and slot bounds in inode_48 (C28020 — MSVC SA
  needs __assume, not just assert).
- test_art_key_view_full_chain.cpp: const iterator ref in
  verify_stack (C26460); const prefix (C26496).
- art_internal_impl.hpp: suppress C26460 on inode_256::remove
  db_instance param (false positive from if constexpr unused branch).
- art_internal_impl.hpp: preemptive C26460 suppress on all inode
  remove/init/direct_remove_child_pointer functions that take
  db_instance& — false positive from if constexpr unused branch.
  Applied to inode_4, inode_16, inode_48, and inode_256 variants.
- test/test_art.cpp: restore vertical whitespace between include
  groups and before namespace (R5); add comments that chain_i4_per_key
  and olc_chain_collapse are counter adjustment values, not bools (R6).
- test/test_art_key_view_full_chain.cpp: add doc comment explaining
  LC() — leaf count adjustment, 0 for VIS, n otherwise (R7).
- art.hpp: improve insert_internal_fixed guard comment — explain it's
  unreachable because caller dispatches key_view separately; use
  std::ignore instead of (void) casts (R9).
Add early-return guards for empty key_view in get_internal and
remove_internal for both db and olc_db. Empty keys would cause UB
when indexing remaining_key[0] in inode dispatch. Insert already
rejects empty keys with std::length_error; get/remove now return
not-found/false instead of UB.
When inserting into inode_4 or inode_16 (which maintain sorted key
arrays), children at positions >= insert_pos are shifted right. The
value bitmask must be shifted to match, otherwise existing packed-value
bits become misaligned with their children.

Add value_bitmask_field::insert_at() — the inverse of remove_at() —
and call it from both leaf and packed-value add_to_nonfull overloads
in inode_4 and inode_16. inode_48 and inode_256 use direct key-byte
indexing and are unaffected.
…it (C3/C8)

C3: In insert_internal (root-insert path), the leaf unique_ptr was
released before calling build_chain.  If build_chain threw on the
first inode_4::create, the leaf was leaked.  Fix: pass leaf.get()
to build_chain and only call release() after it succeeds.

C8: chain_depth() generates at most 1024 unique keys (4 tags × 256
variants).  Add assert to catch callers that exceed this limit.
Two tests in ARTKeyViewFullChainTest that exercise the bitmask shift
bug: insert packed values in descending key-byte order so each insert
shifts prior entries right in the sorted I4 array. Without the
insert_at fix, the test crashes with SIGSEGV (misaligned bitmask
causes a packed value to be dereferenced as a pointer).

- BitmaskShiftOnInsertBeforePackedValue: 2 keys, minimal case
- BitmaskShiftMultiplePackedValues: 4 keys filling an I4, plus removal
Inserts 200 random 4-byte encoded float keys (seed 42) into all three
db types, verifies scan integrity, then removes every other key and
re-verifies.  This exercises the I4/I16 bitmask shift logic with
realistic key distributions that create deep trees with mixed
packed-value and chain children.

Reproduces the P8 partial-index crash (handoff-olc-crash.md).
- test_art_key_view_full_chain.cpp: add missing RESTORE_MSVC_WARNINGS (C5032)
- olc_art.hpp: suppress C26447 on db_leaf_qsbr_deleter::operator()
- art.hpp: mark leaf_ptr const (C26496)
- micro_benchmark_key_view.cpp: use value not ref in benchmark loop (clang-cl)
- msvc-build.yml: restore msvc-check branch trigger
- Suppress C26447 on db_inode_qsbr_deleter::operator() (noexcept + on_next_epoch_deallocate)
- Mark leaf_ptr const in try_insert (C26496)
- Suppress C26815 false positive on cached_leaf.get() in try_insert
- test_art_key_view_full_chain.cpp: uppercase float suffixes (F), std::array
  instead of C-style array, size_t loop index, NOLINT for NRVO
- art_internal_impl.hpp: add gnu::pure to is_value_in_slot dispatch
…texpr

- art_internal_impl.hpp: bufferAccessOutOfBounds, memsetClass on pack_value
- art.hpp: missingReturn on insert_internal_fixed (if constexpr + CANNOT_HAPPEN)
- olc_art.hpp: missingReturn on remove_or_choose_subtree (same pattern)
- art_internal_impl.hpp: cast (i+1) to unsigned in bitmask shift (sign-conversion)
- art.hpp: suppress GCC suggest-attribute=noreturn on insert_internal_fixed
- test_art_key_view_full_chain.cpp: uppercase U suffix, NOLINT constant seed,
  remove useless size_t→uint64_t casts, use .empty() not .size()>=1
- benchmark/micro_benchmark_utils.hpp: remove useless size_t→uint64_t casts
- art.hpp: move DISABLE_GCC_WARNING before template (fixes _Pragma scope)
- test/test_art.cpp: [[maybe_unused]] on template variables (clang 14)
- test/test_art_key_view_full_chain.cpp: DISABLE_CLANG_21_WARNING for -Wnrvo
- art_internal_impl.hpp: cast (i % 8) to unsigned in array-based bitmask
- Restore static_cast<uint64_t> on encode(i) — needed on macOS where
  size_t != uint64_t; suppress GCC useless-cast at file level
- benchmark/micro_benchmark_utils.hpp, micro_benchmark_key_view.cpp:
  same useless-cast suppression
…ssions

- art_internal_impl.hpp: cast bits[] to unsigned in test() to prevent
  int promotion before shift; add memsetClass suppress on unpack_value
- art.hpp, olc_art.hpp: add inline cppcheck-suppress missingReturn at
  CANNOT_HAPPEN sites (macOS cppcheck needs it at the error line)
Reorder allocation: build chain I4 nodes BEFORE mutating the tree.
If build_chain throws std::bad_alloc, the tree is unchanged.
If the subsequent create/grow throws, the pre-built chain is cleaned
up via delete_subtree in a catch block.

Fixed call sites (art.hpp):
- VIS nonfull: build chain, add_to_nonfull(packed), overwrite slot
- VIS grow: build chain, try { create } catch { delete_subtree }
- Leaf nonfull: leaf.release(), build chain, add_to_nonfull(chain_top)
- Leaf grow: leaf.release(), build chain, try { create } catch
- Prefix split (VIS + leaf): build chain, try { create } catch
- Leaf-leaf split: dead code (static_assert documents why)

Fixed call sites (olc_art.hpp):
- Same patterns but with OLC write guard cleanup on restart
- Leaf-leaf split: dead code (static_assert documents why)

build_chain fix:
- Set owns_current=true in tail section
- Catch block: free leaf on first-alloc failure (!child_is_value)

Tests: 8 new key_view OOM tests (nonfull, grow, prefix split,
multi-node chain × VIS/leaf). 53 total OOM tests pass.
- C4702 (unreachable code): I256 set/clear_value_bit used early return
  after if constexpr; restructure to positive if constexpr guard.
- C26815 (dangling pointer): false positive on olc_node_ptr construction
  from cached_leaf.release(); suppress at 3 new call sites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark coverage for indices with key_view keys.

2 participants