Skip to content

Conversation

@aykut-bozkurt
Copy link
Member

@aykut-bozkurt aykut-bozkurt commented Mar 24, 2025

We might use temp files as intermediate step. For COPY TO stdout, table => temp file => stdout. For COPY FROM stdin, stdin => file => table. There will be intermediate file IO overhead but this is the simplest and decent solution for now (considering most of the time will be lost during row to columnar conversions).

Closes #69.

}

impl Drop for ParquetWriterContext {
fn drop(&mut self) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to make Drop fail safe.


impl ParsedUriInfo {
fn for_stdout() -> Self {
let path = temp_dir().join(format!("pg_parquet_{}", Uuid::new_v4()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have used Postgres temp files?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these cleaned up on failure? PG is quite particular about the naming schema of temp files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we made sure cleaning them up ourselves.

const PQ_LARGE_MESSAGE_LIMIT: i32 = 1024 * 1024 * 1024 - 3;
const PQ_SMALL_MESSAGE_LIMIT: i32 = 10000;

unsafe fn receive_data_from_client(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 96.16788% with 21 lines in your changes missing coverage. Please review.

Project coverage is 91.34%. Comparing base (0de2de1) to head (7dd421f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/parquet_copy_hook/copy_from_stdin.rs 84.54% 17 Missing ⚠️
src/parquet_copy_hook/copy_to_stdout.rs 95.55% 2 Missing ⚠️
src/arrow_parquet/uri_utils.rs 98.03% 1 Missing ⚠️
...c/parquet_copy_hook/copy_to_split_dest_receiver.rs 96.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   91.15%   91.34%   +0.19%     
==========================================
  Files          88       91       +3     
  Lines       12606    13059     +453     
==========================================
+ Hits        11491    11929     +438     
- Misses       1115     1130      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


// return an owned copy of the tupledesc (needs pfree but not release) That prevents a bunch of
// errors during cleanup.
tupledesc.clone()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit unfortunate. I needed to copy tupledesc to get rid of crashes during cleanup. But this is not a hot path, so seems fine.

@aykut-bozkurt aykut-bozkurt marked this pull request as ready for review April 9, 2025 20:07
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/stdin-out branch 3 times, most recently from d00ec14 to 057d896 Compare April 12, 2025 20:08

// ParsedUriInfo is a struct that holds the parsed uri information.
#[derive(Debug, Clone)]
#[derive(Debug)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make no cloneable to make sure Drop runs once

Copy link
Collaborator

@marcoslot marcoslot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Needs some readme update, also emphasizing that you really need to add format 'parquet in this case.

collected_tuple_column_sizes: *mut i64,
target_batch_size: i64,
uri: *const c_char,
is_stdio: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe is_to_stdout

copy_options: CopyToParquetOptions,
per_copy_context: MemoryContext,
copy_mctx: MemoryContext,
row_group_mctx: MemoryContext,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row_group_memory_context would be more readable

@aykut-bozkurt aykut-bozkurt enabled auto-merge (squash) April 17, 2025 16:16
@aykut-bozkurt aykut-bozkurt merged commit 29769de into main Apr 17, 2025
4 checks passed
@aykut-bozkurt aykut-bozkurt deleted the aykut/stdin-out branch April 17, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support COPY TO/FROM stdout/stdin

3 participants