Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ aws-config = { version = "1", default-features = false, features = ["rustls","rt
aws-credential-types = {version = "1", default-features = false}
azure_storage = {version = "0.21", default-features = false}
futures = "0.3"
glob = "0.3"
home = "0.5"
libc = {version = "0.2", default-features = false }
object_store = {version = "=0.12.2", default-features = false, features = ["aws", "azure", "fs", "gcp", "http"]}
Expand Down
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ COPY table FROM 's3://mybucket/data.parquet' WITH (format 'parquet');
- [Inspect Parquet schema](#inspect-parquet-schema)
- [Inspect Parquet metadata](#inspect-parquet-metadata)
- [Inspect Parquet column statistics](#inspect-parquet-column-statistics)
- [List and read Parquet files from uri pattern](#list-and-read-parquet-files-from-uri-pattern)
- [Object Store Support](#object-store-support)
- [Copy Options](#copy-options)
- [Configuration](#configuration)
Expand Down Expand Up @@ -217,6 +218,40 @@ SELECT * FROM parquet.column_stats('/tmp/product_example.parquet')
(13 rows)
```

### List and read Parquet files from uri pattern

You can call `SELECT * FROM parquet.list(<uri_pattern>)` to see all uris that matches with the uri pattern.
Uri pattern can resolve `**` for directories and `*` for words in the uri.


```sql
COPY (SELECT i FROM generate_series(1, 1000000) i) TO '/tmp/some/test.parquet' with (file_size_bytes '1MB');
COPY 1000000

SELECT * FROM parquet.list('/tmp/some/**/*.parquet');
uri | size
---------------------------------------+---------
/tmp/some/test.parquet/data_4.parquet | 100162
/tmp/some/test.parquet/data_3.parquet | 1486916
/tmp/some/test.parquet/data_2.parquet | 1486916
/tmp/some/test.parquet/data_0.parquet | 1486920
/tmp/some/test.parquet/data_1.parquet | 1486916
(5 rows)

```

Uri pattern is also supported by `COPY FROM` for all supported object stores except `http(s)` endpoints.
```sql
COPY (SELECT i FROM generate_series(1, 1000000) i) TO 's3://testbucket/some/test.parquet' with (file_size_bytes '1MB');
COPY 1000000

CREATE TABLE test(a int);
CREATE TABLE

COPY test FROM 's3://testbucket/some/**/*.parquet';
COPY 1000000
```

## Object Store Support
`pg_parquet` supports reading and writing Parquet files from/to `S3`, `Azure Blob Storage`, `http(s)` and `Google Cloud Storage` object stores.

Expand Down Expand Up @@ -304,7 +339,7 @@ Supported authorization methods' priority order is shown below:

#### Http(s) Storage

`Https` uris are supported by default. You can set `ALLOW_HTTP` environment variable to allow `http` uris.
Only `https` uris are supported by default. You can set `ALLOW_HTTP` environment variable to allow `http` uris.

#### Google Cloud Storage

Expand Down
Loading