Skip to content

[VARIANT] Add support for the json_to_variant API #7783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jul 3, 2025
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
87b438d
add json_to_variant
harshmotw-db Jun 25, 2025
a946ac6
comment fix
harshmotw-db Jun 25, 2025
bca3b81
Added another sample buffer managger
harshmotw-db Jun 25, 2025
339e880
minor refactoring
harshmotw-db Jun 25, 2025
fe798c3
Merge branch 'main' of https://github.com/harshmotw-db/arrow-rs into …
harshmotw-db Jun 25, 2025
882d3a7
Merge branch 'main' of https://github.com/harshmotw-db/arrow-rs into …
harshmotw-db Jun 25, 2025
3c18fdf
incorporated new changes
harshmotw-db Jun 25, 2025
67a83fe
minor changes
harshmotw-db Jun 26, 2025
c9aa519
Merge branch 'main' of https://github.com/harshmotw-db/arrow-rs into …
harshmotw-db Jun 26, 2025
dede88d
test fix based on recent commit
harshmotw-db Jun 26, 2025
fa3befc
constant fix
harshmotw-db Jun 26, 2025
38bac59
fix
harshmotw-db Jun 26, 2025
cd530ee
addressed comments
harshmotw-db Jun 26, 2025
57b3eb0
fix
harshmotw-db Jun 26, 2025
71b7d6f
fixed VariantBufferManager
harshmotw-db Jun 26, 2025
031c916
deduped a bit of code
harshmotw-db Jun 26, 2025
d4fc876
deduplicated code
harshmotw-db Jun 26, 2025
c41af4e
moved serde code out of variant.rs
harshmotw-db Jun 26, 2025
ecaf557
incorporated Ryan's latest comments
harshmotw-db Jun 26, 2025
4abc598
add more object tests
harshmotw-db Jun 26, 2025
0842ef8
fix
harshmotw-db Jun 26, 2025
94531af
doc fix
harshmotw-db Jun 26, 2025
28d0012
Merge branch 'main' into harsh-motwani_data/from_json
harshmotw-db Jun 27, 2025
e2788f5
Removed dependency on VariantBufferManager and leave that to later
harshmotw-db Jun 27, 2025
3178449
Merge branch 'harsh-motwani_data/from_json' of https://github.com/har…
harshmotw-db Jun 27, 2025
0455685
fix
harshmotw-db Jun 27, 2025
cc0b66e
fix
harshmotw-db Jun 30, 2025
d2a7516
resolved test comment
harshmotw-db Jun 30, 2025
a29b5c3
fixed more comments
harshmotw-db Jun 30, 2025
7f23cf5
fix decimal to string
harshmotw-db Jul 1, 2025
50f4b25
fmt and clippy
harshmotw-db Jul 1, 2025
560e430
fix
harshmotw-db Jul 1, 2025
3249d93
Merge remote-tracking branch 'apache/main' into harsh-motwani_data/fr…
alamb Jul 1, 2025
07d5688
clippy
alamb Jul 1, 2025
af937ac
Split tests into separate functions
alamb Jul 1, 2025
388f188
partially removed decimal dependency
harshmotw-db Jul 2, 2025
7407776
merge
harshmotw-db Jul 2, 2025
3b42d91
removed decimal type from variant json parsing
harshmotw-db Jul 2, 2025
43d6ea5
comment fix
harshmotw-db Jul 2, 2025
e9deda9
refined lifetimes
harshmotw-db Jul 3, 2025
eb11890
Fix clippy, remove unused dependency
alamb Jul 3, 2025
3531540
Merge remote-tracking branch 'apache/main' into harsh-motwani_data/fr…
alamb Jul 3, 2025
ea5b573
Update parquet-variant/src/from_json.rs
alamb Jul 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion parquet-variant/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,10 @@ rust-version = "1.83"
[dependencies]
arrow-schema = { workspace = true }
chrono = { workspace = true }
serde_json = "1.0"
# "arbitrary_precision" allows us to manually parse numbers. "preserve_order" does not automatically
# sort object keys
serde_json = { version = "1.0", features = ["arbitrary_precision", "preserve_order"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about the (undocumented) arbitrary_precision feature flag and (undocumented) Number::as_str method it unlocks 🤯

That could solve a lot of problems for variant decimals on the to_json path as well:
https://github.com/apache/arrow-rs/blob/main/parquet-variant/src/to_json.rs#L147-L162

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question tho -- do we need to support both modes of serde_json, given that the user (not arrow-rs) probably decides whether to use that feature flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible some projects might prioritize performance and might not want the Decimal type at all. I would prefer doing that as a follow up though. If you agree, I'll go ahead and create an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, for sure it's a follow-up. But we probably do need a story for how to handle that feature flag in a dependency (maybe part of the bigger story of how to handle a serde_json dependency in the first place).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust_decimal = "1.37.2"
base64 = "0.21"

[lib]
58 changes: 58 additions & 0 deletions parquet-variant/examples/variant_from_json_examples.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Example showing how to convert Variant values to JSON

use parquet_variant::{
json_to_variant, variant_to_json, variant_to_json_string, variant_to_json_value,
SampleVecBasedVariantBufferManager,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
// The caller must provide an object implementing the `VariantBufferManager` trait to the library.
// This allows the library to write the constructed variant to buffers provided by the caller.
// This way, the caller has direct control over the output buffers.
let mut variant_buffer_manager = SampleVecBasedVariantBufferManager {
value_buffer: vec![0u8; 1],
metadata_buffer: vec![0u8; 1],
};

let person_string = "{\"name\":\"Alice\", \"age\":30, ".to_string()
+ "\"email\":\"[email protected]\", \"is_active\": true, \"score\": 95.7,"
+ "\"additional_info\": null}";
let (metadata_size, value_size) = json_to_variant(&person_string, &mut variant_buffer_manager)?;

let variant = parquet_variant::Variant::try_new(
&variant_buffer_manager.metadata_buffer[..metadata_size],
&variant_buffer_manager.value_buffer[..value_size],
)?;

let json_string = variant_to_json_string(&variant)?;
let json_value = variant_to_json_value(&variant)?;
let pretty_json = serde_json::to_string_pretty(&json_value)?;
println!("{}", pretty_json);

let mut buffer = Vec::new();
variant_to_json(&mut buffer, &variant)?;
let buffer_result = String::from_utf8(buffer)?;

// Verify all methods produce the same result
assert_eq!(json_string, buffer_result);
assert_eq!(json_string, serde_json::to_string(&json_value)?);

Ok(())
}
36 changes: 36 additions & 0 deletions parquet-variant/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -721,6 +721,42 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
}
}

pub(crate) trait AppendVariantHelper {
fn append_value_helper<'m, 'd, T: Into<Variant<'m, 'd>>>(&mut self, value: T);

fn new_list_helper(&mut self) -> ListBuilder;

fn new_object_helper(&mut self) -> ObjectBuilder;
}

impl AppendVariantHelper for ListBuilder<'_> {
fn append_value_helper<'m, 'd, T: Into<Variant<'m, 'd>>>(&mut self, value: T) {
self.append_value(value);
}

fn new_list_helper(&mut self) -> ListBuilder {
self.new_list()
}

fn new_object_helper(&mut self) -> ObjectBuilder {
self.new_object()
}
}

impl AppendVariantHelper for VariantBuilder {
fn append_value_helper<'m, 'd, T: Into<Variant<'m, 'd>>>(&mut self, value: T) {
self.append_value(value);
}

fn new_list_helper(&mut self) -> ListBuilder {
self.new_list()
}

fn new_object_helper(&mut self) -> ObjectBuilder {
self.new_object()
}
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down
143 changes: 143 additions & 0 deletions parquet-variant/src/from_json.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
pub use crate::variant::{VariantDecimal4, VariantDecimal8};
use crate::variant_buffer_manager::VariantBufferManager;
use crate::{AppendVariantHelper, ListBuilder, ObjectBuilder, Variant, VariantBuilder};
use arrow_schema::ArrowError;
use rust_decimal::prelude::*;
use serde_json::{Map, Number, Value};

/// Eventually, internal writes should also be performed using VariantBufferManager instead of
/// ValueBuffer and MetadataBuffer so the caller has control of the memory.
/// Returns a pair <value_size, metadata_size>
pub fn json_to_variant(
json: &str,
variant_buffer_manager: &mut impl VariantBufferManager,
) -> Result<(usize, usize), ArrowError> {
let mut builder = VariantBuilder::new();
let json: Value = serde_json::from_str(json)
.map_err(|e| ArrowError::InvalidArgumentError(format!("JSON format error: {}", e)))?;

build_json(&json, &mut builder)?;
let (metadata, value) = builder.finish();
let value_size = value.len();
let metadata_size = metadata.len();

// Write to caller's buffers - Remove this when the library internally writes to the caller's
// buffers anyway
let caller_metadata_buffer =
variant_buffer_manager.ensure_size_and_borrow_metadata_buffer(metadata_size)?;
caller_metadata_buffer[..metadata_size].copy_from_slice(metadata.as_slice());
let caller_value_buffer =
variant_buffer_manager.ensure_size_and_borrow_value_buffer(value_size)?;
caller_value_buffer[..value_size].copy_from_slice(value.as_slice());
Ok((metadata_size, value_size))
}

fn build_json(json: &Value, builder: &mut VariantBuilder) -> Result<(), ArrowError> {
append_json(json, builder)?;
Ok(())
}

fn variant_from_number<'a, 'b>(n: &Number) -> Result<Variant<'a, 'b>, ArrowError> {
if let Some(i) = n.as_i64() {
// Find minimum Integer width to fit
if i as i8 as i64 == i {
Ok((i as i8).into())
} else if i as i16 as i64 == i {
Ok((i as i16).into())
} else if i as i32 as i64 == i {
Ok((i as i32).into())
} else {
Ok(i.into())
}
} else {
// Try decimal
// TODO: Replace with custom decimal parsing as the rust_decimal library only supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should look at what arrow-json does for this

// a max unscaled value of 2^96.
match Decimal::from_str_exact(n.as_str()) {
Ok(dec) => {
let unscaled: i128 = dec.mantissa();
let scale = dec.scale() as u8;
if unscaled.abs() <= VariantDecimal4::MAX_UNSCALED_VALUE as i128
&& scale <= VariantDecimal4::MAX_PRECISION as u8
{
(unscaled as i32, scale).try_into()
} else if unscaled.abs() <= VariantDecimal8::MAX_UNSCALED_VALUE as i128
&& scale <= VariantDecimal8::MAX_PRECISION as u8
{
(unscaled as i64, scale).try_into()
} else {
(unscaled, scale).try_into()
}
}
Err(_) => {
// Try double
match n.as_f64() {
Some(f) => return Ok(f.into()),
None => Err(ArrowError::InvalidArgumentError(format!(
"Failed to parse {} as number",
n.as_str()
))),
}?
}
}
}
}

fn append_json(json: &Value, builder: &mut impl AppendVariantHelper) -> Result<(), ArrowError> {
match json {
Value::Null => builder.append_value_helper(Variant::Null),
Value::Bool(b) => builder.append_value_helper(*b),
Value::Number(n) => {
builder.append_value_helper(variant_from_number(n)?);
}
Value::String(s) => builder.append_value_helper(s.as_str()),
Value::Array(arr) => {
let mut list_builder = builder.new_list_helper();
build_list(arr, &mut list_builder)?;
list_builder.finish();
}
Value::Object(obj) => {
let mut obj_builder = builder.new_object_helper();
build_object(obj, &mut obj_builder)?;
obj_builder.finish();
}
};
Ok(())
}

fn build_list(arr: &[Value], builder: &mut ListBuilder) -> Result<(), ArrowError> {
for val in arr {
append_json(val, builder)?;
}
Ok(())
}

fn build_object<'a, 'b>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions only have one caller each now -- should we just fold their 3-4 lines of code into their call sites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to try_fold. Lmk if you meant something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I just meant to get rid of the functions and move their code directly where the function used to be called from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

obj: &'b Map<String, Value>,
builder: &mut ObjectBuilder<'a, 'b>,
) -> Result<(), ArrowError> {
for (key, value) in obj.iter() {
let mut field_builder = ObjectFieldBuilder { key, builder };
append_json(value, &mut field_builder)?;
}
Ok(())
}

struct ObjectFieldBuilder<'a, 'b, 'c> {
key: &'a str,
builder: &'b mut ObjectBuilder<'c, 'a>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to interpret... can we use 'm and 'v?

Suggested change
struct ObjectFieldBuilder<'a, 'b, 'c> {
key: &'a str,
builder: &'b mut ObjectBuilder<'c, 'a>,
struct ObjectFieldBuilder<'m, 'v, 'r> {
key: &'r str,
builder: &'r mut ObjectBuilder<'m, 'v>,

(here, 'r is the lifetime of the references used to construct the field builder)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to 's, 'o and 'v where 's is the lifetime of the 's is the lifetime of the string, 'o is the lifetime of [ObjectBuilder] and v is the lifetime of the variant buffers.

}

impl AppendVariantHelper for ObjectFieldBuilder<'_, '_, '_> {
fn append_value_helper<'m, 'd, T: Into<Variant<'m, 'd>>>(&mut self, value: T) {
self.builder.insert(self.key, value);
}

fn new_list_helper(&mut self) -> ListBuilder {
self.builder.new_list(self.key)
}

fn new_object_helper(&mut self) -> ObjectBuilder {
self.builder.new_object(self.key)
}
}
4 changes: 4 additions & 0 deletions parquet-variant/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,14 @@ mod decoder;
mod variant;
// TODO: dead code removal
mod builder;
mod from_json;
mod to_json;
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allow was bothering me , so here is a PR to clean it up:

mod utils;
mod variant_buffer_manager;

pub use builder::*;
pub use from_json::json_to_variant;
pub use to_json::{variant_to_json, variant_to_json_string, variant_to_json_value};
pub use variant::*;
pub use variant_buffer_manager::{SampleVecBasedVariantBufferManager, VariantBufferManager};
8 changes: 4 additions & 4 deletions parquet-variant/src/variant/decimal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ pub struct VariantDecimal4 {
}

impl VariantDecimal4 {
const MAX_PRECISION: u32 = 9;
const MAX_UNSCALED_VALUE: u32 = 10_u32.pow(Self::MAX_PRECISION) - 1;
pub(crate) const MAX_PRECISION: u32 = 9;
pub(crate) const MAX_UNSCALED_VALUE: u32 = 10_u32.pow(Self::MAX_PRECISION) - 1;

pub fn try_new(integer: i32, scale: u8) -> Result<Self, ArrowError> {
// Validate that scale doesn't exceed precision
Expand Down Expand Up @@ -73,8 +73,8 @@ pub struct VariantDecimal8 {
}

impl VariantDecimal8 {
const MAX_PRECISION: u32 = 18;
const MAX_UNSCALED_VALUE: u64 = 10_u64.pow(Self::MAX_PRECISION) - 1;
pub(crate) const MAX_PRECISION: u32 = 18;
pub(crate) const MAX_UNSCALED_VALUE: u64 = 10_u64.pow(Self::MAX_PRECISION) - 1;

pub fn try_new(integer: i64, scale: u8) -> Result<Self, ArrowError> {
// Validate that scale doesn't exceed precision
Expand Down
57 changes: 57 additions & 0 deletions parquet-variant/src/variant_buffer_manager.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
use arrow_schema::ArrowError;

pub trait VariantBufferManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand this trait.

It almost seems like a good building block for eventual columnar writes, if an implementation took a mutable reference to a vec, where each VariantBufferManager you build with it is only capable of appending new bytes to the end (leaving existing bytes unchanged). That way, one could build up an offset array along the way as part of building up arrow binary arrays for the metadata and value columns?

But I don't understand the whole "may be called several times" part, which seems to return the whole slice on every call? I guess that's because the variant builder doesn't do its work all at once, but still expects to receive the overall slice?

Finally, why split the ensure and borrow methods, when both take &mut self and it's almost certainly dangerous to "borrow" without an "ensure" first (can't guess the size in advance, it's variant data)? Why not just a pair of ensure_and_borrow_XXX_buffer methods that return a slice of exactly the requested size (no larger, no smaller)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to return the whole slice, and have the Variant builder choose to write data wherever it chooses to. The trait was originally created for this PR where bytes are often being shifted to make way for headers etc. Let me know if you disagree or would recommend a different simple alternative.

As for why the methods are separate, I think you are generally correct and they could be combined. In the original PR, I was calling ensure_value_buffer_size only occasionally but that was because I was performing size checks in the variant library before calling the method and that check could be moved into the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have consolidated the two methods.

Copy link
Contributor

@alamb alamb Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that currently the VariantBuilders need to write into a memory buffer anyways, I wonder what is this trait trying to abstract?

Like maybe the json_to_variant should take a VariantBuilder to write to and not try to manage buffers itself 🤔

Copy link
Contributor Author

@harshmotw-db harshmotw-db Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my opinion is that the variant builders should write to buffers owned by the caller. So no more ValueBuffer and MetadataBuffer - just VariantBufferManager.
delta-io/delta-kernel-rs#1034
This PR just lays the groundwork for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having VariantBuilder write to buffers managed by somewhere else is a good idea and is useful more generally than json_to_variant

Perhaps we can make the json_to_variant API be in terms of the VariantBuilder and then modify the VariantBuilder in a separate PR to handle buffers owned by the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a ticket to track the idea of writing to buffers owned by the caller here

Copy link
Contributor Author

@harshmotw-db harshmotw-db Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better if the VariantBuilder dependency was abstracted away? So, right now the functionality looks like: (1) here's the JSON string, (2) Here are my buffers (variant_buffer_manager), write the variant to the buffers.

The alternative is: (1) Here's the JSON string, (2) Here are my buffers, (3) this is the VariantBuilder built using my buffers (which I don't know why you need) => write the variants to these buffers.

I think the current approach VariantBufferManager should be reasonably extensible to every use case and also supports resizing on the fly. The example solution described in this ticket would likely be retry based which would have a higher performance overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yeah, in the short run, I think it is good not to have dependency on VariantBufferManager since there are unanswered questions. I will implement the VariantBuilder-based API today

Copy link
Contributor Author

@harshmotw-db harshmotw-db Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I have removed the dependency of the current PR on custom buffer management and implemented what you suggested.

/// Returns the slice where the variant metadata needs to be written to. This method may be
/// called several times during the construction of a new `metadata` field in a variant. The
/// implementation must make sure that on every call, all the data written to the metadata
/// buffer so far are preserved.
/// The implementation must also make sure that the length of the slice being returned is at
/// least `size` bytes. The implementation may throw an error if it is unable to fulfill its
/// requirements.
fn ensure_size_and_borrow_metadata_buffer(
&mut self,
size: usize,
) -> Result<&mut [u8], ArrowError>;

/// Returns the slice where value needs to be written to. This method may be called several
/// times during the construction of a new `value` field in a variant. The implementation must
/// make sure that on every call, all the data written to the value buffer so far are preserved.
/// The implementation must also make sure that the length of the slice being returned is at
/// least `size` bytes. The implementation may throw an error if it is unable to fulfill its
/// requirements.
fn ensure_size_and_borrow_value_buffer(&mut self, size: usize)
-> Result<&mut [u8], ArrowError>;
}

pub struct SampleVecBasedVariantBufferManager {
pub value_buffer: Vec<u8>,
pub metadata_buffer: Vec<u8>,
}

impl VariantBufferManager for SampleVecBasedVariantBufferManager {
fn ensure_size_and_borrow_metadata_buffer(
&mut self,
size: usize,
) -> Result<&mut [u8], ArrowError> {
let cur_len = self.metadata_buffer.len();
if size > cur_len {
// Reallocate larger buffer
let new_len = size.next_power_of_two();
self.metadata_buffer.resize(new_len, 0);
}
Ok(&mut self.metadata_buffer)
}

fn ensure_size_and_borrow_value_buffer(
&mut self,
size: usize,
) -> Result<&mut [u8], ArrowError> {
let cur_len = self.value_buffer.len();
if size > cur_len {
// Reallocate larger buffer
let new_len = size.next_power_of_two();
self.value_buffer.resize(new_len, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to do this -- the underlying vec is guaranteed to have reasonable amortized allocation costs:

Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. ... Whatever strategy is used will of course guarantee O(1) amortized push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for demonstration purposes so custom non-vec implementations allocate reasonably sized buffers so they don't encounter O(n^2) complexity. Maybe, we should request power of two from the library itself when we integrate this deep into construction.

}
Ok(&mut self.value_buffer)
}
}
Loading