Skip to content

Conversation

@aykut-bozkurt
Copy link
Member

@aykut-bozkurt aykut-bozkurt commented Jan 30, 2025

Adds field_ids option, which lets you specify how to assign field ids during COPY TO.
Supported values for it:

  • none (default) => no field ids are written into parquet metadata.
  • auto => pg_parquet generates fields ids starting from 0.
  • <json string> => pg_parquet will use the given field ids. e.g.
create table test_table(a int, b text[]);
copy test_table to '/tmp/test.parquet' with (field_ids '{"a": 1, "b": {"__root_field_id": 2, "element": 3}}');

Closes #106.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/field-ids branch 2 times, most recently from ad7f2e5 to 52a44e9 Compare January 30, 2025 15:16
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/field-ids branch 2 times, most recently from f49dcac to 3ea9bf9 Compare January 31, 2025 19:24
@aykut-bozkurt aykut-bozkurt marked this pull request as ready for review January 31, 2025 19:25
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 92971fe to 5b5725d Compare March 6, 2025 09:34
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 5b5725d to 3613f84 Compare March 7, 2025 11:24
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 3613f84 to 08c012a Compare March 7, 2025 11:30
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 08c012a to a707629 Compare March 8, 2025 13:59
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from a707629 to 8a4d5ec Compare March 11, 2025 09:18
@aykut-bozkurt aykut-bozkurt linked an issue Mar 11, 2025 that may be closed by this pull request
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 8a4d5ec to 59715d6 Compare March 14, 2025 12:03
Copy link
Collaborator

@marcoslot marcoslot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work well, though I'm wondering whether we should throw an error for column names that cannot be found to prevent typos.

Explicit(FieldIdMapping),
}

impl FromStr for FieldIds {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, it would be useful to add more comments to things as the code base is growing:

// implements parsing for the field_ids option in COPY .. TO statements

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/file-size-bytes branch from 462c1a0 to 59da25c Compare April 7, 2025 10:04
Base automatically changed from aykut/file-size-bytes to main April 7, 2025 12:54
Adds `field_ids` option, which lets you specify how to assign field ids during COPY TO.
Supported values for it:
- `none` (default) => no field ids are written into parquet metadata.
- `auto` => pg_parquet generates fields ids starting from 0.
- `<json string>` => pg_parquet will use the given field ids. e.g.
```sql
create table test_table(a int, b text[]);
copy test_table to '/tmp/test.parquet' with (field_ids '{"a": 1, "b": {"__root_field_id": 2, "element": 3}}');
```

Closes #106.
@aykut-bozkurt aykut-bozkurt enabled auto-merge (squash) April 7, 2025 13:23
@codecov
Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 70.39604% with 299 lines in your changes missing coverage. Please review.

Project coverage is 91.12%. Comparing base (f8c3d62) to head (cfa4dec).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/pgrx_tests/copy_options.rs 63.01% 226 Missing ⚠️
src/arrow_parquet/schema_parser.rs 80.57% 34 Missing ⚠️
src/arrow_parquet/field_ids.rs 73.33% 28 Missing ⚠️
src/pgrx_tests/common.rs 73.68% 5 Missing ⚠️
src/parquet_copy_hook/copy_to_dest_receiver.rs 63.63% 4 Missing ⚠️
...c/parquet_copy_hook/copy_to_split_dest_receiver.rs 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #102      +/-   ##
==========================================
- Coverage   92.90%   91.12%   -1.79%     
==========================================
  Files          86       87       +1     
  Lines       11650    12479     +829     
==========================================
+ Hits        10823    11371     +548     
- Misses        827     1108     +281     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aykut-bozkurt aykut-bozkurt merged commit 7f6d421 into main Apr 7, 2025
4 of 6 checks passed
@aykut-bozkurt aykut-bozkurt deleted the aykut/field-ids branch April 7, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support writing field ids

3 participants