Skip to content

[VARIANT] Create low-level Variant library with json_to_variant implementation #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

harshmotw-db
Copy link

@harshmotw-db harshmotw-db commented Jun 25, 2025

What changes are proposed in this pull request?

This PR introduces a low-level Variant library with the json_to_variant function which is similar to parse_json in Spark. The function is written in such a way that the caller owns the memory that the output is written to. The caller needs to implement VariantMemoryManager with the methods borrow_value_buffer, borrow_metadata_buffer, ensure_value_buffer_size and ensure_metadata_buffer_size.

How was this change tested?

Several unit tests to manually compare the constructed variants with raw bytes. Implementing variant_to_json should increase coverage and make more tests easier. While the PR currently contains many tests, we will be adding more tests.

TODO:

  1. Test UTF-8 strings with varying character widths.
  2. Test size limit exceeded errors.
  3. More testing on variant objects - nesting, different offset sizes, is_large, keys in different languages etc.
  4. Formalize errors - currently the errors thrown by this library are a little rough.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Jun 25, 2025
@scovich
Copy link
Collaborator

scovich commented Jun 25, 2025

qq: How does this relate to the ongoing work to support variant in arrow-rs?
https://github.com/apache/arrow-rs/tree/main/parquet-variant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that require a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants