-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
WIP: Faster io.unalloc for sequential reads ##io #25180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,6 +2,83 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #include <r_io.h> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Range-based validation cache implementation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #define VALID_CACHE_CAPACITY 64 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_API void r_io_valid_cache_init(RIO *io) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_RETURN_IF_FAIL (io); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| io->valid_cache.ranges = calloc (VALID_CACHE_CAPACITY, sizeof (RIOVALIDRangeCache)); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if (io->valid_cache.ranges) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| io->valid_cache.capacity = VALID_CACHE_CAPACITY; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| io->valid_cache.count = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| io->valid_cache.last_query_addr = UT64_MAX; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_API void r_io_valid_cache_fini(RIO *io) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_RETURN_IF_FAIL (io); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if (io->valid_cache.ranges) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| free (io->valid_cache.ranges); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| memset (&io->valid_cache, 0, sizeof (io->valid_cache)); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_API void r_io_valid_cache_invalidate(RIO *io) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_RETURN_IF_FAIL (io); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| R_RETURN_IF_FAIL (io); | |
| R_RETURN_IF_FAIL (io); | |
| if (!io->valid_cache.ranges) { | |
| return; | |
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing API documentation. The new public API functions (r_io_valid_cache_init, r_io_valid_cache_fini, r_io_valid_cache_invalidate) lack documentation comments explaining their purpose, parameters, and when they should be called. Add docstring comments following the project's documentation standards to explain these functions.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache lookup uses a linear search through all cached ranges, which is O(n) where n is the number of cached entries (up to 64). For better performance with sequential reads, consider organizing ranges in a way that makes lookups more efficient, such as keeping them sorted and using binary search, or using the most recently used entry as a hint.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache lookup doesn't check if the cache is properly initialized (ranges is NULL). If initialization failed or cache was freed, this will result in dereferencing a NULL pointer when accessing cache->ranges. Add a check for cache->ranges != NULL before attempting to access the cache.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIFO eviction strategy is inefficient for the stated use case of sequential reads. The memmove operation shifts all entries on every eviction, which is O(n) complexity. For sequential access patterns, an LRU or circular buffer would be more appropriate. Consider using a circular buffer with a head pointer to avoid the memmove overhead.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached range assumes a 1KB (1024-byte) range is uniformly valid/invalid starting from the queried address. However, this assumption may be incorrect when:
- The queried address is near a map boundary
- Multiple maps with different validity states exist within this range
- The address is near the end of a descriptor
This can lead to incorrect cache hits returning stale results. The cache should either validate range boundaries against actual map extents, or use a much smaller range (e.g., single byte) to avoid crossing boundaries.
| // Add new entry - we cache a small range around the address since | |
| // most accesses are consecutive | |
| RIOVALIDRangeCache *range = &cache->ranges[cache->count]; | |
| range->start_addr = addr; | |
| range->end_addr = addr + 1023; // Cache 1K range | |
| // Add new entry - we cache the exact address | |
| RIOVALIDRangeCache *range = &cache->ranges[cache->count]; | |
| range->start_addr = addr; | |
| range->end_addr = addr; // Cache single-byte range to avoid crossing map boundaries |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggests that map_id will be used for cache invalidation tracking, but the field is never set in this implementation. This leaves the field uninitialized (potentially containing garbage values from memory). Either implement the map_id tracking or remove the field and comment if it's not needed yet.
| // TODO: Set proper map_id when we can track it | |
| // map_id tracking is not implemented yet; use 0 as the default/sentinel. | |
| range->map_id = 0; |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is not invalidated when permissions change (e.g., via r_io_map_remap or permission updates). The cache stores validity results that depend on the hasperm parameter, but different permission checks for the same address may have different validity results. Without tracking the hasperm parameter used for each cached entry, this can return incorrect results when the same address is checked with different permission requirements.
| RIOVALIDCache *cache = &io->valid_cache; | |
| // Fast path: same as last query (common for consecutive bytes) | |
| if (cache->last_query_addr != UT64_MAX && addr == cache->last_query_addr) { | |
| *result = cache->last_query_result; | |
| return true; | |
| } | |
| // Search through cached ranges | |
| for (int i = 0; i < cache->count; i++) { | |
| RIOVALIDRangeCache *range = &cache->ranges[i]; | |
| if (addr >= range->start_addr && addr <= range->end_addr) { | |
| *result = range->is_valid; | |
| cache->last_query_addr = addr; | |
| cache->last_query_result = range->is_valid; | |
| return true; | |
| } | |
| } | |
| return false; | |
| } | |
| // Add a validation result to cache | |
| static void r_io_valid_cache_add(RIO *io, ut64 addr, bool result) { | |
| RIOVALIDCache *cache = &io->valid_cache; | |
| // If cache is full, replace oldest entry (simple FIFO) | |
| if (cache->count >= cache->capacity) { | |
| // Shift all entries to make space at the end | |
| if (cache->count > 0) { | |
| memmove (&cache->ranges[0], &cache->ranges[1], | |
| sizeof (RIOVALIDRangeCache) * (cache->count - 1)); | |
| } | |
| cache->count--; | |
| } | |
| // Add new entry - we cache a small range around the address since | |
| // most accesses are consecutive | |
| RIOVALIDRangeCache *range = &cache->ranges[cache->count]; | |
| range->start_addr = addr; | |
| range->end_addr = addr + 1023; // Cache 1K range | |
| range->is_valid = result; | |
| // TODO: Set proper map_id when we can track it | |
| cache->count++; | |
| cache->last_query_addr = addr; | |
| cache->last_query_result = result; | |
| R_UNUSED (io); | |
| R_UNUSED (addr); | |
| R_UNUSED (result); | |
| // Disabled cache: always indicate no cached result so callers recompute. | |
| return false; | |
| } | |
| // Add a validation result to cache | |
| static void r_io_valid_cache_add(RIO *io, ut64 addr, bool result) { | |
| R_UNUSED (io); | |
| R_UNUSED (addr); | |
| R_UNUSED (result); | |
| // Disabled cache: do not store any validation results. |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache add function doesn't check if the cache is properly initialized (ranges is NULL). If initialization failed, this will attempt to write to NULL pointer. Add a check for cache->ranges != NULL before attempting to add entries.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache implementation is not thread-safe. Multiple threads can simultaneously read and modify the cache state (count, last_query_addr, last_query_result, and ranges array), leading to race conditions. If r_io_is_valid_offset can be called from multiple threads, add synchronization (e.g., mutex) to protect cache access.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Control flow issue: when the cache lookup path is taken (lines 104-114), the variable 'result' is set but may be overwritten by the memcmp on line 114 regardless of the read success. If both cache reads fail (line 110-112), result is set to false, but then line 114 unconditionally overwrites it with the memcmp result, which checks uninitialized or 0xff-filled memory. The 'result' assignment on line 111 should be followed by else clause wrapping line 114, or line 114 should only execute if reads succeed.
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is not invalidated on write operations. If the cache mode uses R_PERM_X for validation (line 104), writes to memory could change the validation status (e.g., writing 0xffffffff to make an address invalid, or writing other values to make it valid). The cache should be invalidated after write operations to ensure correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory allocation failure is not properly handled. If calloc fails, the cache fields are not initialized, leaving them in an undefined state. The function should either set capacity to 0 on allocation failure, or set ranges to NULL explicitly to ensure consistent state.