Commit e3b4476
authored
Coerce types on read (#76)
`COPY FROM parquet` is too strict when matching Postgres tupledesc schema to the parquet file schema.
e.g. `INT32` type in the parquet schema cannot be read into a Postgres column with `int64` type.
We can avoid this situation by casting arrow array to the array that is expected by the tupledesc
schema, if the cast is possible. We can make use of `arrow-cast` crate, which is in the same project
with `arrow`. Its public api lets us check if a cast possible between 2 arrow types and perform the cast.
To make sure the cast is possible, we need to do 2 checks:
1. arrow-cast allows the cast from "arrow type at the parquet file" to "arrow type at the schema that is
generated for tupledesc", (user created custom cast functions at Postgres won't work by arrow-cast)
2. the cast is meaningful at Postgres. We check if there is a cast from "Postgres type that corresponds to the arrow type at Parquet file" to "Postgres type at the tupledesc".
With that we can implicitly cast between many types as shown below:
- INT16 => INT32
- UINT32 => INT64
- FLOAT32 => FLOAT64
- LargeUtf8 => UTF8
- LargeBinary => Binary
- Struct, Array, and Map with castable fields, e.g. [UINT16] => [INT64] or struct {'x': UINT16} => struct {'x': INT64}
**NOTE**: Struct fields must always strictly match by name and position.
We can cast below types but with runtime errors e.g. value overflow
- INT64 => INT32
- TIMESTAMPTZ => TIMESTAMP
Closes #67.
Closes #79.1 parent 518a5ac commit e3b4476
File tree
13 files changed
+1668
-402
lines changed- src
- arrow_parquet
- arrow_to_pg
- parquet_copy_hook
- type_compat
13 files changed
+1668
-402
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | | - | |
| 113 | + | |
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
188 | | - | |
| 188 | + | |
189 | 189 | | |
190 | 190 | | |
191 | | - | |
| 191 | + | |
192 | 192 | | |
193 | 193 | | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
194 | 197 | | |
195 | 198 | | |
196 | 199 | | |
| |||
0 commit comments