-
Notifications
You must be signed in to change notification settings - Fork 32
Copy to/from stdout/stdin with (format parquet) #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| impl Drop for ParquetWriterContext { | ||
| fn drop(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to make Drop fail safe.
src/arrow_parquet/uri_utils.rs
Outdated
|
|
||
| impl ParsedUriInfo { | ||
| fn for_stdout() -> Self { | ||
| let path = temp_dir().join(format!("pg_parquet_{}", Uuid::new_v4())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might have used Postgres temp files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these cleaned up on failure? PG is quite particular about the naming schema of temp files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we made sure cleaning them up ourselves.
| const PQ_LARGE_MESSAGE_LIMIT: i32 = 1024 * 1024 * 1024 - 3; | ||
| const PQ_SMALL_MESSAGE_LIMIT: i32 = 10000; | ||
|
|
||
| unsafe fn receive_data_from_client( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5bbb5cb to
53ab324
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #121 +/- ##
==========================================
+ Coverage 91.15% 91.34% +0.19%
==========================================
Files 88 91 +3
Lines 12606 13059 +453
==========================================
+ Hits 11491 11929 +438
- Misses 1115 1130 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| // return an owned copy of the tupledesc (needs pfree but not release) That prevents a bunch of | ||
| // errors during cleanup. | ||
| tupledesc.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit unfortunate. I needed to copy tupledesc to get rid of crashes during cleanup. But this is not a hot path, so seems fine.
53ab324 to
f525bfa
Compare
d00ec14 to
057d896
Compare
057d896 to
970a241
Compare
|
|
||
| // ParsedUriInfo is a struct that holds the parsed uri information. | ||
| #[derive(Debug, Clone)] | ||
| #[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make no cloneable to make sure Drop runs once
marcoslot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Needs some readme update, also emphasizing that you really need to add format 'parquet in this case.
| collected_tuple_column_sizes: *mut i64, | ||
| target_batch_size: i64, | ||
| uri: *const c_char, | ||
| is_stdio: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe is_to_stdout
| copy_options: CopyToParquetOptions, | ||
| per_copy_context: MemoryContext, | ||
| copy_mctx: MemoryContext, | ||
| row_group_mctx: MemoryContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row_group_memory_context would be more readable
We might use temp files as intermediate step. For COPY TO stdout, table => temp file => stdout. For COPY FROM stdin, stdin => file => table. There will be intermediate file IO overhead but this is the simplest and decent solution for now (considering most of the time will be lost during row to columnar conversions).
Closes #69.