You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
COPY TO parquet now supports a new option, called `file_size_bytes`, which lets you
generate parquet files with target size = `file_size_bytes`.
When a parquet file exceeds the target size, it will be flushed and a new parquet file
will be generated under a parent directory. (parent directory will be the path without
the parquet extension)
e.g.
```sql
COPY (select 'hellooooo' || i from generate_series(1, 1000000) i) to '/tmp/test.parquet' with (file_size_bytes 1048576);
```
```bash
> ls -alh /tmp/test/
1.4M data_0.parquet
1.4M data_1.parquet
1.4M data_2.parquet
1.4M data_3.parquet
114K data_4.parquet
```
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -242,6 +242,7 @@ Supported authorization methods' priority order is shown below:
242
242
## Copy Options
243
243
`pg_parquet` supports the following options in the `COPY TO` command:
244
244
-`format parquet`: you need to specify this option to read or write Parquet files which does not end with `.parquet[.<compression>]` extension,
245
+
-`file_size_bytes <int>`: the total byte size per Parquet file. When set, the parquet files, with target size, are created under parent directory (named the same as file name without file extension). By default, when not specified, a single file is generated without creating a parent folder.
245
246
-`row_group_size <int>`: the number of rows in each row group while writing Parquet files. The default row group size is `122880`,
246
247
-`row_group_size_bytes <int>`: the total byte size of rows in each row group while writing Parquet files. The default row group size bytes is `row_group_size * 1024`,
247
248
-`compression <string>`: the compression format to use while writing Parquet files. The supported compression formats are `uncompressed`, `snappy`, `gzip`, `brotli`, `lz4`, `lz4raw` and `zstd`. The default compression format is `snappy`. If not specified, the compression format is determined by the file extension,
0 commit comments