Skip to content

Desktop: Importing from OneNote: Fix large .one files with many attachments fail to import#15117

Merged
laurent22 merged 24 commits intolaurent22:devfrom
personalizedrefrigerator:pr/desktop/onenote-import/fix-large-file-handling
Apr 16, 2026
Merged

Desktop: Importing from OneNote: Fix large .one files with many attachments fail to import#15117
laurent22 merged 24 commits intolaurent22:devfrom
personalizedrefrigerator:pr/desktop/onenote-import/fix-large-file-handling

Conversation

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator

@personalizedrefrigerator personalizedrefrigerator commented Apr 16, 2026

Problem

Previously, Joplin would attempt to load entire .one files into memory before import. This could cause import to fail for large .one files. This could be a problem, for example, for .one files that include many large attachments, since attachments are bundled within the file.

In particular, .one files larger than 2 GB would fail to import completely, with a "RangeError: File size (3180223840) is greater than 2 GiB".

Notes:

  • This pull request is related to .one files. .onepkg and .zip files are archives that can include one or more .one files. As such, .onepkg and .zip files larger than 2 GiB could previously be imported, provided that none of the .one files they contain are large.
  • See also: A possibly-related related forum post for an attempt to import a 1.95 GB notebook.

Solution

Refactor to avoid loading full .one files into memory during import.

Remaining issues:

  • Although all attachments are no longer loaded at once, each attachment is still read into memory before writing to disk. This could be a problem for large (e.g. 2 GB) attachments.
  • The importer still runs in the main process and doesn't yield to the JavaScript event loop. If an import takes longer than 30 seconds, Joplin shows a "window unresponsive" error dialog. A possible fix for a future pull request would be to move the importer to a web worker.

Testing

Windows:

  1. Create a roughly 3 GB .one file and import it.
  2. Verify that the file fails to import in a previous version of Joplin.
  3. Verify that the import completes successfully.
Screencast.from.2026-04-15.21-24-47.webm

@coderabbitai coderabbitai Bot added bug It's a bug desktop All desktop platforms import Related to importing files such as ENEX, JEX, etc. performance Performance issues labels Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b4d2d9f0-bee7-4f9c-af37-825b98cb5338

📥 Commits

Reviewing files that changed from the base of the PR and between a1c23f4 and b5511be.

📒 Files selected for processing (1)
  • packages/onenote-converter/parser-utils/node_functions.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/onenote-converter/parser-utils/node_functions.js

📝 Walkthrough

Walkthrough

Replaces eager in-memory file reads with streamed, seekable I/O using u64/BigInt sizes across Node/WASM/native layers; Reader and FileBlob adopt file-backed, lazy-loading models; FileHandle exposes byte_length(); new error variant for integer conversion added.

Changes

Cohort / File(s) Summary
File API
packages/onenote-converter/parser-utils/src/file_api/api.rs, packages/onenote-converter/parser-utils/src/file_api/native_driver.rs, packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs
FileHandle now requires fn byte_length(&self) -> u64. Native driver returns BufReader<fs::File> implementing byte_length(). WASM externs changed from usizeu64 for read/size; internal offset/seek/read logic updated to u64.
Reader & Backing Model
packages/onenote-converter/parser-utils/src/reader.rs
Major rewrite: unified ReaderData for slice or file-backed data; reads return owned Vec<u8>; offsets/lengths use u64; added Seek impl, save_position/restore_position, peek_u8, as_data_ref/ReaderDataRef and position APIs.
JS Node Bindings
packages/onenote-converter/parser-utils/node_functions.js
fileReader(path) callbacks updated: read receives BigInt params and converts to Number for fs.readSync; size() now returns BigInt.
FileBlob / Lazy Loading
packages/onenote-converter/parser/src/shared/file_data_ref.rs
FileBlob becomes loader-backed (Rc<dyn FileDataLoader>) with explicit size; adds FileDataLoader trait and impls; as_ref() removed in favour of len()/load(); equality based on size + loader identity.
Parser API & Call Sites
packages/onenote-converter/parser/src/onenote/mod.rs, packages/onenote-converter/parser/src/onestore/mod.rs, packages/onenote-converter/parser/src/bin/inspect.rs
Parser APIs now accept paths/Readers (borrowed &str) instead of preloaded byte slices; binaries no longer pre-read files; parser rewinds readers via seek for retry parsing instead of cloning.
Chunk/Reader Resolution
packages/onenote-converter/parser/src/local_onestore/common/file_chunk_reference.rs, packages/onenote-converter/parser/src/local_onestore/file_structure/..., packages/onenote-converter/parser/src/local_onestore/file_node/...
FileChunkReference API changed from resolve_to_reader(...) -> Reader to seek_reader_to(&mut Reader) -> Result<()>; callers now save/restore positions and perform in-place seeking with u64 offsets.
Data/Blob Reads
packages/onenote-converter/parser/src/fsshttpb/data/..., packages/onenote-converter/parser/src/shared/property.rs
Removed redundant .to_vec() copies: reader now returns owned Vec<u8> used directly; debug formatting adjusted to use stored sizes or direct lengths.
Compact Integers & Utilities
packages/onenote-converter/parser/src/shared/compact_u64.rs
Parsing switched to use Reader APIs (get_u8, read) and u64-aware logic; tests updated to use Reader::from.
Embedded Resources & Images
packages/onenote-converter/parser/src/onenote/embedded_file.rs, packages/onenote-converter/parser/src/onenote/image.rs, packages/onenote-converter/renderer/src/page/..., packages/onenote-converter/renderer/src/lib.rs
Accessors changed from borrowed slices/Option<Rc<_>> to fallible Result<Vec<u8>>/Result<Option<Vec<u8>>>; renderers updated to propagate errors (?); removed PartialOrd derives where applicable.
Errors
packages/onenote-converter/parser-utils/src/errors.rs
From<std::io::Error> now maps ErrorKind::UnexpectedEof explicitly using the I/O error message; added impl From<TryFromIntError> for Error and new ErrorKind::TypeConversionFailed { err: TryFromIntError }.
Miscellaneous
various local_onestore files
Looping/fragment parsing and padding/advance logic adjusted to use seek_reader_to() and consistent u64/usize casts for remaining/advance calculations.

Sequence Diagram(s)

sequenceDiagram
  participant JS as Node (node_functions.js)
  participant WASM as WASM bindings
  participant Rust as parser-utils (Reader / FileHandle)
  participant FS as Filesystem

  JS->>WASM: fileReader(path) -> create handle
  JS-->>WASM: size() returns BigInt (rgba(0,128,0,0.5))
  JS->>WASM: read(bigPosition, bigLength) (BigInt)
  WASM->>Rust: read(offset:u64, size:u64)
  Rust->>Rust: Reader.seek/read -> Vec<u8>
  Rust->>FS: fs.read via native driver (BufReader)
  Rust-->>WASM: Uint8Array (rgba(0,0,255,0.5))
  WASM-->>JS: bytes returned
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing large .one file imports by addressing a 2 GB size limitation problem.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem (large .one files failing to import), the solution (refactoring to avoid in-memory loading), and known limitations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
packages/onenote-converter/parser/src/shared/compact_u64.rs (1)

23-82: Please add regression cases for every width branch.

This parser now has separate decoding and EOF paths for 7/14/21/28/35/42/49/64-bit values, but the exercised tests in this patch still only cover the zero case. A small table-driven suite here would make the refactor much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser/src/shared/compact_u64.rs` around lines 23
- 82, Add table-driven regression tests that cover every width branch in
CompactU64::parse: test the zero-case, the 7-bit branch (first_byte & 1), 14-bit
(first_byte & 2), 21-bit (first_byte & 4), 28-bit (8), 35-bit (16), 42-bit (32),
49-bit (64) and the 64-bit full path (first_byte & 128). For each case construct
a Reader input that exercises both normal decoding and EOF/error behavior (e.g.,
truncated byte slices) and assert the returned CompactU64 value or the error;
reference the parse function and Reader.get_u8/get_u64/read methods to locate
where to craft inputs and expected outputs.
packages/onenote-converter/renderer/src/page/image.rs (1)

11-15: Stream image payloads straight to disk.

Line 11 still materialises the whole attachment before Line 15 writes it, so one very large image can still hit the same memory ceiling this PR is trying to remove. A chunked write path here would close most of the remaining gap for single huge attachments.

Based on learnings: "The fix uses streamFileDataUri to read large files in 3 MB chunks ... so the full file content is never in memory at once."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/renderer/src/page/image.rs` around lines 11 - 15,
The current code calls image.data()? which materializes the entire payload
before writing; change to a streaming/chunked write so large images are never
fully in memory: use determine_image_filename(image)? to get filename and
fs_driver().join(&self.output, &filename) for path, then open/create a writable
file handle via the fs_driver API (or add a write_file_stream method) and
iterate over the image's streaming API (e.g., streamFileDataUri or whatever
yields chunks from image) writing each chunk to the file handle until finished;
remove the image.data()? call and replace fs_driver().write_file(&path,
&data[..])? with the chunked write loop so only a small buffer is allocated at a
time.
packages/onenote-converter/parser-utils/src/file_api/native_driver.rs (1)

91-95: Consider propagating metadata errors instead of defaulting to 0.

If metadata() fails on an already-opened file, returning 0 will cause subsequent seeks to fail with misleading "out-of-bounds" errors. Whilst this scenario is rare, propagating the actual error would aid debugging.

Note: This would require changing the FileHandle trait's byte_length() signature to return Result<u64>, which may be a larger refactor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/file_api/native_driver.rs` around
lines 91 - 95, The current FileHandle implementation for BufReader<fs::File>
swallows metadata() errors by returning 0 in byte_length(), causing misleading
out-of-bounds seeks; update the FileHandle::byte_length signature to return
Result<u64, std::io::Error> (or the crate’s Error type) and change the impl for
BufReader<fs::File> to propagate get_ref().metadata()?.len() as Ok(len) (or
map_ok) so metadata errors are returned instead of defaulting to 0; update
callers of FileHandle::byte_length (and any trait impls) to handle the Result
accordingly.
packages/onenote-converter/parser/src/shared/file_data_ref.rs (2)

15-18: PartialEq now means “same loader instance”, not “same blob”.

Two FileBlobs with identical bytes but different loaders now compare unequal, including independently-created empty blobs. If callers still treat this as a value type, that behavioural shift is going to be surprising. I’d either drop PartialEq or make the identity check explicit via a separate helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser/src/shared/file_data_ref.rs` around lines
15 - 18, The current PartialEq impl on FileBlob compares loader identity
(Rc::ptr_eq) rather than blob bytes, which makes value-equal blobs compare
unequal; remove the impl PartialEq for FileBlob to avoid surprising behavior and
instead add an explicit identity helper like FileBlob::is_same_loader(&self,
other: &FileBlob) -> bool that returns Rc::ptr_eq(&self.loader, &other.loader);
if true value-equality semantics are required elsewhere, add a separate method
FileBlob::equals_bytes(&self, other: &FileBlob) -> bool that compares size and
actual bytes (reading via the loader) rather than reintroducing PartialEq.

21-23: Avoid letting size drift from the loader output.

FileBlob now has two sources of truth for length: the stored size and whatever load() returns. If they ever disagree, len() lies and the mismatch is only discovered much later. Consider deriving the size from the loader or validating data.len() == self.size in load().

Also applies to: 53-65

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser/src/shared/file_data_ref.rs` around lines
21 - 23, FileBlob currently trusts a stored size field while load() from
FileDataLoader can return a different byte vector, leading len() being wrong;
update FileBlob::load() to read the data via the FileDataLoader::load()
implementation, then validate and reconcile the size: after obtaining data from
load(), either set self.size = data.len() (deriving the canonical size from the
loader output) or return an error if data.len() != self.size (choose consistent
behavior for your domain), and ensure FileBlob::len() uses the reconciled
self.size; refer to the FileDataLoader trait, FileBlob::load(), and
FileBlob::len() when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/onenote-converter/parser-utils/node_functions.js`:
- Around line 35-49: The read() and size() helpers currently coerce
bigPosition/bigLength to Number which can silently lose precision; preserve
BigInt end-to-end by passing bigPosition (and bigLength where supported) as
BigInt to fs.readSync (or validate and explicitly range-check before coercing to
Number), and have size() return the BigInt obtained from fs.fstatSync(fd, {
bigint: true }) rather than constructing BigInt(size) from a lost-precision
Number; update references in read, size, and any fstat usage to use the bigint
path and only convert to Number when you have validated the value fits safely.

In `@packages/onenote-converter/parser-utils/src/reader.rs`:
- Around line 152-159: The buffer-backed branch of as_data_ref (when matching
ReaderData::BufferRef in reader.rs) slices buffer[start..start + size] without
bounds checking and can panic; add a check comparing size to self.remaining()
(using self.data_offset) and if size > remaining() return Err(UnexpectedEof)
(consistent with read_exact behavior) instead of slicing, otherwise proceed to
create ReaderDataRef::Vec from the slice; ensure the same validation logic
applies conceptually to the BufferRef path to prevent malformed .one files from
causing a panic.
- Around line 230-236: The From<Box<dyn FileHandle>> impl for Reader<'a> assumes
the handle is at offset 0 but sets data_offset = 0 without synchronising the
actual FileHandle; update this constructor to either (A) seek/rewind the
underlying handle to position 0 and read its current position to set data_offset
consistently (use the FileHandle seek/rewind API before wrapping in
ReaderData::File), or (B) replace the From impl with a fallible constructor
(e.g., Reader::from_file(handle) -> Result<Reader, Error>) that queries and/or
seeks the handle to enforce the invariant and returns an error on failure;
ensure you update data_len via value.byte_length() and set data_offset from the
actual file cursor after the seek/query so Reader bookkeeping and the real file
cursor stay in sync.
- Around line 263-275: The FilePointer branch in ReaderDataRef uses
file.borrow_mut() and returns early on errors from file.seek or file.read_exact,
leaving the shared file cursor moved; fix by performing the seek/read inside a
sub-scope/closure or using a local Result (e.g. let read_result = (|| { ...
})();) so the `?` inside that closure returns an Err into read_result instead of
exiting the function, then always restore the cursor with
file.seek(SeekFrom::Start(original_offset))?; update the
ReaderDataRef::FilePointer code around original_offset, file.seek, and
read_result so cursor restoration runs regardless of read success or failure.

In `@packages/onenote-converter/parser/src/fsshttpb/data/binary_item.rs`:
- Around line 15-16: The code casts CompactU64::parse(...).value() (named size)
directly to usize before calling reader.read, which can truncate on platforms
where usize is smaller than u64; change the unchecked cast to a checked
conversion (e.g. use usize::try_from(size) or size.try_into()) and return an
error if the conversion fails, then pass the safely converted usize into
reader.read; update the CompactU64::parse usage and reader.read(size) call sites
(variables size and reader.read) to use the checked conversion and an
appropriate parse/error return on overflow.

---

Nitpick comments:
In `@packages/onenote-converter/parser-utils/src/file_api/native_driver.rs`:
- Around line 91-95: The current FileHandle implementation for
BufReader<fs::File> swallows metadata() errors by returning 0 in byte_length(),
causing misleading out-of-bounds seeks; update the FileHandle::byte_length
signature to return Result<u64, std::io::Error> (or the crate’s Error type) and
change the impl for BufReader<fs::File> to propagate get_ref().metadata()?.len()
as Ok(len) (or map_ok) so metadata errors are returned instead of defaulting to
0; update callers of FileHandle::byte_length (and any trait impls) to handle the
Result accordingly.

In `@packages/onenote-converter/parser/src/shared/compact_u64.rs`:
- Around line 23-82: Add table-driven regression tests that cover every width
branch in CompactU64::parse: test the zero-case, the 7-bit branch (first_byte &
1), 14-bit (first_byte & 2), 21-bit (first_byte & 4), 28-bit (8), 35-bit (16),
42-bit (32), 49-bit (64) and the 64-bit full path (first_byte & 128). For each
case construct a Reader input that exercises both normal decoding and EOF/error
behavior (e.g., truncated byte slices) and assert the returned CompactU64 value
or the error; reference the parse function and Reader.get_u8/get_u64/read
methods to locate where to craft inputs and expected outputs.

In `@packages/onenote-converter/parser/src/shared/file_data_ref.rs`:
- Around line 15-18: The current PartialEq impl on FileBlob compares loader
identity (Rc::ptr_eq) rather than blob bytes, which makes value-equal blobs
compare unequal; remove the impl PartialEq for FileBlob to avoid surprising
behavior and instead add an explicit identity helper like
FileBlob::is_same_loader(&self, other: &FileBlob) -> bool that returns
Rc::ptr_eq(&self.loader, &other.loader); if true value-equality semantics are
required elsewhere, add a separate method FileBlob::equals_bytes(&self, other:
&FileBlob) -> bool that compares size and actual bytes (reading via the loader)
rather than reintroducing PartialEq.
- Around line 21-23: FileBlob currently trusts a stored size field while load()
from FileDataLoader can return a different byte vector, leading len() being
wrong; update FileBlob::load() to read the data via the FileDataLoader::load()
implementation, then validate and reconcile the size: after obtaining data from
load(), either set self.size = data.len() (deriving the canonical size from the
loader output) or return an error if data.len() != self.size (choose consistent
behavior for your domain), and ensure FileBlob::len() uses the reconciled
self.size; refer to the FileDataLoader trait, FileBlob::load(), and
FileBlob::len() when making the change.

In `@packages/onenote-converter/renderer/src/page/image.rs`:
- Around line 11-15: The current code calls image.data()? which materializes the
entire payload before writing; change to a streaming/chunked write so large
images are never fully in memory: use determine_image_filename(image)? to get
filename and fs_driver().join(&self.output, &filename) for path, then
open/create a writable file handle via the fs_driver API (or add a
write_file_stream method) and iterate over the image's streaming API (e.g.,
streamFileDataUri or whatever yields chunks from image) writing each chunk to
the file handle until finished; remove the image.data()? call and replace
fs_driver().write_file(&path, &data[..])? with the chunked write loop so only a
small buffer is allocated at a time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1fb11b20-240c-46de-bbc5-f07d09094c32

📥 Commits

Reviewing files that changed from the base of the PR and between 3279ea2 and 885f524.

📒 Files selected for processing (26)
  • packages/onenote-converter/parser-utils/node_functions.js
  • packages/onenote-converter/parser-utils/src/errors.rs
  • packages/onenote-converter/parser-utils/src/file_api/api.rs
  • packages/onenote-converter/parser-utils/src/file_api/native_driver.rs
  • packages/onenote-converter/parser-utils/src/file_api/wasm_driver.rs
  • packages/onenote-converter/parser-utils/src/reader.rs
  • packages/onenote-converter/parser/src/bin/inspect.rs
  • packages/onenote-converter/parser/src/fsshttpb/data/binary_item.rs
  • packages/onenote-converter/parser/src/fsshttpb/data/stream_object.rs
  • packages/onenote-converter/parser/src/fsshttpb/data_element/data_element_fragment.rs
  • packages/onenote-converter/parser/src/fsshttpb/data_element/object_data_blob.rs
  • packages/onenote-converter/parser/src/local_onestore/common/file_chunk_reference.rs
  • packages/onenote-converter/parser/src/local_onestore/file_node/file_node.rs
  • packages/onenote-converter/parser/src/local_onestore/file_structure/file_node_list.rs
  • packages/onenote-converter/parser/src/local_onestore/file_structure/file_node_list_fragment.rs
  • packages/onenote-converter/parser/src/local_onestore/one_store_file.rs
  • packages/onenote-converter/parser/src/onenote/embedded_file.rs
  • packages/onenote-converter/parser/src/onenote/image.rs
  • packages/onenote-converter/parser/src/onenote/mod.rs
  • packages/onenote-converter/parser/src/onestore/mod.rs
  • packages/onenote-converter/parser/src/shared/compact_u64.rs
  • packages/onenote-converter/parser/src/shared/file_data_ref.rs
  • packages/onenote-converter/parser/src/shared/property.rs
  • packages/onenote-converter/renderer/src/lib.rs
  • packages/onenote-converter/renderer/src/page/embedded_file.rs
  • packages/onenote-converter/renderer/src/page/image.rs

Comment thread packages/onenote-converter/parser-utils/node_functions.js
Comment thread packages/onenote-converter/parser-utils/src/reader.rs
Comment thread packages/onenote-converter/parser-utils/src/reader.rs Outdated
Comment thread packages/onenote-converter/parser-utils/src/reader.rs
Comment thread packages/onenote-converter/parser/src/fsshttpb/data/binary_item.rs Outdated
@vinschger
Copy link
Copy Markdown

has this mow been fixed in the current beta? where can I download it for macos silicone? then I will test it again for my 1.95 onepkg import into joplin.

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator Author

personalizedrefrigerator commented Apr 16, 2026

has this mow been fixed in the current beta?

This has not yet been fixed in the current beta. (This change will need to be merged before it is available in a prerelease).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/onenote-converter/parser-utils/src/reader.rs (2)

152-164: ⚠️ Potential issue | 🟠 Major

Reject oversized as_data_ref requests before slicing.

read_exact() and advance() already normalise short input to UnexpectedEof, but Line 158 still panics on buffer-backed readers when size > remaining(), and Line 160 can still hand back a deferred invalid file range. Please validate size up front here as well.

Proposed fix
 pub fn as_data_ref(&mut self, size: usize) -> Result<ReaderDataRef> {
+    if self.remaining() < size as u64 {
+        return Err(
+            ErrorKind::UnexpectedEof("Unexpected EOF (Reader.as_data_ref)".into()).into(),
+        );
+    }
+
     match &mut self.data {
         ReaderData::BufferRef { buffer } => {
             let start = self.data_offset as usize;
             // Cloning needs to be done early with BufferRef, since we don't own the original
             // data. Large data should generally use `ReaderData::File`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 152 -
164, In Reader::as_data_ref, reject requests where size > self.remaining() up
front to avoid panics and invalid file ranges: before matching on self.data (and
before slicing buffer[start..start+size] or returning a FilePointer), check if
size <= self.remaining() and return the same UnexpectedEof error variant used by
read_exact()/advance() when the request is too large; apply this validation for
both ReaderData::BufferRef (so slicing can't panic) and ReaderData::File (so the
returned FilePointer does not expose an invalid offset/size).

230-236: ⚠️ Potential issue | 🟠 Major

Synchronise the file handle before wrapping it.

This constructor sets data_offset to 0, but never rewinds or checks the underlying FileHandle. If the caller passes a pre-seeked handle, the invariant documented on Lines 52-53 is broken immediately and later reads/peeks start from the wrong place.

Proposed fix
-impl<'a> From<Box<dyn FileHandle>> for Reader<'a> {
-    fn from(value: Box<dyn FileHandle>) -> Self {
-        Self {
+impl<'a> Reader<'a> {
+    pub fn from_file_handle(mut value: Box<dyn FileHandle>) -> Result<Self> {
+        value.seek(SeekFrom::Start(0))?;
+        Ok(Self {
             data_len: value.byte_length(),
             data_offset: 0,
             data: ReaderData::File(Rc::new(RefCell::new(value))),
-        }
+        })
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser-utils/src/reader.rs` around lines 230 -
236, The From<Box<dyn FileHandle>> for Reader<'a> impl must ensure the incoming
FileHandle is positioned at the start before assuming data_offset = 0; rewind or
seek the boxed handle to position 0 on the FileHandle trait (e.g. call its
rewind/seek-to-start method) before wrapping it in ReaderData::File and setting
data_offset to 0, and handle/report any error from that operation so the Reader
invariant (start at file beginning) holds for later reads/peeks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/onenote-converter/parser/src/shared/compact_u64.rs`:
- Around line 23-24: Change the parse function to accept a mutable reference to
the Reader (e.g. fn parse(reader: &mut Reader) -> Result<CompactU64>) because
Reader::get_u8() and Reader::read() require &mut self; update the function
signature for CompactU64::parse and any call sites (tests already pass &mut
Reader) so internal calls to reader.get_u8() and reader.read() compile.

---

Duplicate comments:
In `@packages/onenote-converter/parser-utils/src/reader.rs`:
- Around line 152-164: In Reader::as_data_ref, reject requests where size >
self.remaining() up front to avoid panics and invalid file ranges: before
matching on self.data (and before slicing buffer[start..start+size] or returning
a FilePointer), check if size <= self.remaining() and return the same
UnexpectedEof error variant used by read_exact()/advance() when the request is
too large; apply this validation for both ReaderData::BufferRef (so slicing
can't panic) and ReaderData::File (so the returned FilePointer does not expose
an invalid offset/size).
- Around line 230-236: The From<Box<dyn FileHandle>> for Reader<'a> impl must
ensure the incoming FileHandle is positioned at the start before assuming
data_offset = 0; rewind or seek the boxed handle to position 0 on the FileHandle
trait (e.g. call its rewind/seek-to-start method) before wrapping it in
ReaderData::File and setting data_offset to 0, and handle/report any error from
that operation so the Reader invariant (start at file beginning) holds for later
reads/peeks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: badd912d-42c7-481a-a9ad-ed5bb1249633

📥 Commits

Reviewing files that changed from the base of the PR and between 885f524 and d3b2fd5.

📒 Files selected for processing (4)
  • packages/onenote-converter/parser-utils/src/errors.rs
  • packages/onenote-converter/parser-utils/src/reader.rs
  • packages/onenote-converter/parser/src/fsshttpb/data/binary_item.rs
  • packages/onenote-converter/parser/src/shared/compact_u64.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/onenote-converter/parser/src/fsshttpb/data/binary_item.rs
  • packages/onenote-converter/parser-utils/src/errors.rs

Comment thread packages/onenote-converter/parser/src/shared/compact_u64.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/onenote-converter/parser/src/onenote/mod.rs (1)

78-83: Breaking API change: parse_onestore_raw now accepts a path instead of raw bytes.

The signature changed from parse_onestore_raw(&mut self, data: &[u8]) to parse_onestore_raw(&mut self, path: &str). While the doc comment indicates this is "for debugging purposes", any existing callers passing raw byte data will break.

If raw byte parsing is still needed for debugging or testing, consider keeping a separate method or adding an overload that accepts &[u8].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/onenote-converter/parser/src/onenote/mod.rs` around lines 78 - 83,
The public API was changed by replacing parse_onestore_raw(&mut self, data:
&[u8]) with a path-based variant; restore backwards compatibility by keeping the
existing path-based version (which uses fs_driver().open_file and
Reader::try_from) and add back an overload parse_onestore_raw(&mut self, data:
&[u8]) that constructs a Reader from the byte slice and delegates to
parse_onestore (using Reader::try_from or a byte-cursor equivalent), so callers
that pass raw bytes continue to work while the path-based helper remains
available for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/onenote-converter/parser/src/onenote/mod.rs`:
- Around line 78-83: The public API was changed by replacing
parse_onestore_raw(&mut self, data: &[u8]) with a path-based variant; restore
backwards compatibility by keeping the existing path-based version (which uses
fs_driver().open_file and Reader::try_from) and add back an overload
parse_onestore_raw(&mut self, data: &[u8]) that constructs a Reader from the
byte slice and delegates to parse_onestore (using Reader::try_from or a
byte-cursor equivalent), so callers that pass raw bytes continue to work while
the path-based helper remains available for debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 89d44d2e-f0a2-4cec-998e-7baea4a17cac

📥 Commits

Reviewing files that changed from the base of the PR and between d3b2fd5 and a1c23f4.

📒 Files selected for processing (2)
  • packages/onenote-converter/parser-utils/src/reader.rs
  • packages/onenote-converter/parser/src/onenote/mod.rs

@laurent22 laurent22 merged commit 4b9918a into laurent22:dev Apr 16, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug It's a bug desktop All desktop platforms import Related to importing files such as ENEX, JEX, etc. performance Performance issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants