Skip to content

Commit a8e09f7

Browse files
committed
Support copying from glob patterns
Closes #112.
1 parent 7060d99 commit a8e09f7

File tree

13 files changed

+651
-118
lines changed

13 files changed

+651
-118
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ aws-config = { version = "1", default-features = false, features = ["rustls","rt
2828
aws-credential-types = {version = "1", default-features = false}
2929
azure_storage = {version = "0.21", default-features = false}
3030
futures = "0.3"
31+
glob = "0.3"
3132
home = "0.5"
3233
libc = {version = "0.2", default-features = false }
3334
object_store = {version = "=0.12.2", default-features = false, features = ["aws", "azure", "fs", "gcp", "http"]}

README.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ COPY table FROM 's3://mybucket/data.parquet' WITH (format 'parquet');
2424
- [Inspect Parquet schema](#inspect-parquet-schema)
2525
- [Inspect Parquet metadata](#inspect-parquet-metadata)
2626
- [Inspect Parquet column statistics](#inspect-parquet-column-statistics)
27+
- [List and read Parquet files from uri pattern](#list-and-read-parquet-files-from-uri-pattern)
2728
- [Object Store Support](#object-store-support)
2829
- [Copy Options](#copy-options)
2930
- [Configuration](#configuration)
@@ -217,6 +218,40 @@ SELECT * FROM parquet.column_stats('/tmp/product_example.parquet')
217218
(13 rows)
218219
```
219220

221+
### List and read Parquet files from uri pattern
222+
223+
You can call `SELECT * FROM parquet.list(<uri_pattern>)` to see all uris that matches with the uri pattern.
224+
Uri pattern can resolve `**` for directories and `*` for words in the uri.
225+
226+
227+
```sql
228+
COPY (SELECT i FROM generate_series(1, 1000000) i) TO '/tmp/some/test.parquet' with (file_size_bytes '1MB');
229+
COPY 1000000
230+
231+
SELECT * FROM parquet.list('/tmp/some/**/*.parquet');
232+
uri | size
233+
---------------------------------------+---------
234+
/tmp/some/test.parquet/data_4.parquet | 100162
235+
/tmp/some/test.parquet/data_3.parquet | 1486916
236+
/tmp/some/test.parquet/data_2.parquet | 1486916
237+
/tmp/some/test.parquet/data_0.parquet | 1486920
238+
/tmp/some/test.parquet/data_1.parquet | 1486916
239+
(5 rows)
240+
241+
```
242+
243+
Uri pattern is also supported by `COPY FROM` for all supported object stores except `http(s)` endpoints.
244+
```sql
245+
COPY (SELECT i FROM generate_series(1, 1000000) i) TO 's3://testbucket/some/test.parquet' with (file_size_bytes '1MB');
246+
COPY 1000000
247+
248+
CREATE TABLE test(a int);
249+
CREATE TABLE
250+
251+
COPY test FROM 's3://testbucket/some/**/*.parquet';
252+
COPY 1000000
253+
```
254+
220255
## Object Store Support
221256
`pg_parquet` supports reading and writing Parquet files from/to `S3`, `Azure Blob Storage`, `http(s)` and `Google Cloud Storage` object stores.
222257

@@ -304,7 +339,7 @@ Supported authorization methods' priority order is shown below:
304339

305340
#### Http(s) Storage
306341

307-
`Https` uris are supported by default. You can set `ALLOW_HTTP` environment variable to allow `http` uris.
342+
Only `https` uris are supported by default. You can set `ALLOW_HTTP` environment variable to allow `http` uris.
308343

309344
#### Google Cloud Storage
310345

0 commit comments

Comments
 (0)