Skip to content

Conversation

@aykut-bozkurt
Copy link
Member

Previously, we printed column statistics for each column per row group via parquet.metadata(uri). With the new udf parquet.column_stats(uri), we print the column stats for each column aggregated by row groups.

Stats for some of the types were printed in a text format that cannot be converted to actual Postgres type. This PR also makes sure the output format is convertible to the actual Postgres type.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/stats-udf branch 2 times, most recently from 5f53ea9 to 2721808 Compare January 29, 2025 15:23
@aykut-bozkurt aykut-bozkurt changed the base branch from main to aykut/dest-receiver-bytes-written January 30, 2025 15:21
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/dest-receiver-bytes-written branch from 1fc60b1 to 44f4190 Compare January 30, 2025 22:13
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/stats-udf branch 2 times, most recently from 90593f3 to f373345 Compare January 30, 2025 22:17
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/dest-receiver-bytes-written branch from 44f4190 to 92a3ef0 Compare January 31, 2025 19:29
> {
let uri = parse_uri(&uri);

ensure_access_privilege_to_uri(&uri, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought: wonder whether we should have a separate role for public URLs

(probably not needed for now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a network connection. might think on it though

.get_basic_info()
.has_id()
{
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not having a filed ID imply not having stats?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

@aykut-bozkurt aykut-bozkurt changed the base branch from aykut/dest-receiver-bytes-written to main March 11, 2025 09:40
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/stats-udf branch 3 times, most recently from 22d6a6b to cfb3d2c Compare March 11, 2025 11:43
@aykut-bozkurt aykut-bozkurt requested a review from marcoslot March 11, 2025 11:43
@codecov
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 96.88889% with 21 lines in your changes missing coverage. Please review.

Project coverage is 92.43%. Comparing base (1b5878d) to head (648d129).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/parquet_udfs/stats.rs 94.15% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
+ Coverage   91.97%   92.43%   +0.46%     
==========================================
  Files          84       85       +1     
  Lines       10651    11288     +637     
==========================================
+ Hits         9796    10434     +638     
+ Misses        855      854       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}) if !is_adjusted_to_u_t_c
);

let is_timetz = matches!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing all these checks upfront seems a bit strange, should they perhaps move into the match below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, still prefer matches to match {}

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/stats-udf branch 2 times, most recently from 97c2b26 to 28b8448 Compare March 13, 2025 15:16
Previously, we printed column statistics for each column per row group via `parquet.metadata(uri)`.
With the new udf `parquet.column_stats(uri)`, we print stats for each column aggregated by row groups.

Stats for some of the types were printed in a text format that cannot be converted to
actual Postgres type. This PR also makes sure the output format is convertible to the actual
Postgres type.
@aykut-bozkurt aykut-bozkurt merged commit b626eb4 into main Apr 2, 2025
6 checks passed
@aykut-bozkurt aykut-bozkurt deleted the aykut/stats-udf branch April 2, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants