Skip to content

Commit 4dc228c

Browse files
committed
Adds support for COPY TO/FROM Azure Blob Storage
Only supports Azure Blob uris in the form of `https://{account}.blob.core.windows.net/{container}/key`. Azure Blob client can be configured with environment variables `AZURE_STORAGE_ACCOUNT_NAME` or `AZURE_STORAGE_SAS_TOKEN`. Additionally, PR supports following S3 uri forms: - `s3(a)://{bucket}/key` - `https://s3.amazonaws.com/{bucket}/key` - `https://{bucket}.s3.amazonaws.com/key` Closes #50
1 parent 0bfc8b6 commit 4dc228c

File tree

12 files changed

+392
-109
lines changed

12 files changed

+392
-109
lines changed

.devcontainer/Dockerfile

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,11 @@ ENV TZ="Europe/Istanbul"
66
ARG PG_MAJOR=17
77

88
# install deps
9-
RUN apt-get update && apt-get -y install build-essential libreadline-dev zlib1g-dev \
10-
flex bison libxml2-dev libxslt-dev libssl-dev \
11-
libxml2-utils xsltproc ccache pkg-config wget \
12-
curl lsb-release sudo nano net-tools git awscli
9+
RUN apt-get update && apt-get -y install build-essential libreadline-dev zlib1g-dev \
10+
flex bison libxml2-dev libxslt-dev libssl-dev \
11+
libxml2-utils xsltproc ccache pkg-config wget \
12+
curl lsb-release ca-certificates gnupg sudo git \
13+
nano net-tools awscli
1314

1415
# install Postgres
1516
RUN sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
@@ -19,6 +20,14 @@ RUN apt-get update && apt-get -y install postgresql-${PG_MAJOR}-postgis-3 \
1920
postgresql-client-${PG_MAJOR} \
2021
libpq-dev
2122

23+
# install azure-cli and azurite
24+
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
25+
RUN apt-get update && apt-get install -y nodejs
26+
RUN curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
27+
RUN echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | tee /etc/apt/sources.list.d/azure-cli.list
28+
RUN apt-get update && apt-get install -y azure-cli
29+
RUN npm install -g azurite
30+
2231
# download and install MinIO server and client
2332
RUN wget https://dl.min.io/server/minio/release/linux-amd64/minio
2433
RUN chmod +x minio
@@ -58,11 +67,3 @@ ARG PGRX_VERSION=0.12.6
5867
RUN cargo install --locked cargo-pgrx@${PGRX_VERSION}
5968
RUN cargo pgrx init --pg${PG_MAJOR} $(which pg_config)
6069
RUN echo "shared_preload_libraries = 'pg_parquet'" >> $HOME/.pgrx/data-${PG_MAJOR}/postgresql.conf
61-
62-
ENV MINIO_ROOT_USER=admin
63-
ENV MINIO_ROOT_PASSWORD=admin123
64-
ENV AWS_S3_TEST_BUCKET=testbucket
65-
ENV AWS_REGION=us-east-1
66-
ENV AWS_ACCESS_KEY_ID=admin
67-
ENV AWS_SECRET_ACCESS_KEY=admin123
68-
ENV PG_PARQUET_TEST=true

.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
]
1616
}
1717
},
18-
"postStartCommand": "bash .devcontainer/scripts/setup-minio.sh",
18+
"postStartCommand": "bash .devcontainer/scripts/setup_minio.sh && bash .devcontainer/scripts/setup_azurite.sh",
1919
"forwardPorts": [
2020
5432
2121
],
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
source setup_test_envs.sh
4+
5+
nohup azurite --location /tmp/azurite-storage > /dev/null 2>&1 &
6+
7+
az storage container create --name "${AZURE_TEST_CONTAINER_NAME}" --public off --connection-string "$AZURE_STORAGE_CONNECTION_STRING"
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#!/bin/bash
22

3+
source setup_test_envs.sh
4+
35
nohup minio server /tmp/minio-storage > /dev/null 2>&1 &
46

57
mc alias set local http://localhost:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# S3 tests
2+
export AWS_ACCESS_KEY_ID=admin
3+
export AWS_SECRET_ACCESS_KEY=admin123
4+
export AWS_REGION=us-east-1
5+
export AWS_S3_TEST_BUCKET=testbucket
6+
export MINIO_ROOT_USER=admin
7+
export MINIO_ROOT_PASSWORD=admin123
8+
9+
# Azure Blob tests
10+
export AZURE_TEST_CONTAINER_NAME=testcontainer
11+
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://localhost:10000/devstoreaccount1;"
12+
13+
# Other
14+
export PG_PARQUET_TEST=true
15+
export RUST_TEST_THREADS=1

.env_sample

Lines changed: 0 additions & 5 deletions
This file was deleted.

.github/workflows/ci.yml

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
name: CI lints and tests
22
on:
33
push:
4-
branches:
5-
- "*"
4+
branches: [ "main" ]
5+
pull_request:
6+
branches: [ "main" ]
67

78
concurrency:
89
group: ${{ github.ref }}
@@ -69,12 +70,23 @@ jobs:
6970
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
7071
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
7172
sudo apt-get update
72-
sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev libxml2-utils xsltproc ccache pkg-config
73+
sudo apt-get -y install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev \
74+
libxslt-dev libssl-dev libxml2-utils xsltproc ccache pkg-config \
75+
gnupg ca-certificates
7376
sudo apt-get -y install postgresql-${{ env.PG_MAJOR }}-postgis-3 \
7477
postgresql-server-dev-${{ env.PG_MAJOR }} \
7578
postgresql-client-${{ env.PG_MAJOR }} \
7679
libpq-dev
7780
81+
- name: Install Azurite
82+
run: |
83+
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
84+
sudo apt-get update && sudo apt-get install -y nodejs
85+
curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
86+
echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
87+
sudo apt-get update && sudo apt-get install -y azure-cli
88+
npm install -g azurite
89+
7890
- name: Install MinIO
7991
run: |
8092
# Download and install MinIO server and client
@@ -107,23 +119,14 @@ jobs:
107119
$(pg_config --sharedir)/extension \
108120
/var/run/postgresql/
109121
110-
# pgrx tests with runas argument ignores environment variables, so
111-
# we read env vars from .env file in tests (https://github.com/pgcentralfoundation/pgrx/pull/1674)
112-
touch /tmp/.env
113-
echo AWS_ACCESS_KEY_ID=${{ env.AWS_ACCESS_KEY_ID }} >> /tmp/.env
114-
echo AWS_SECRET_ACCESS_KEY=${{ env.AWS_SECRET_ACCESS_KEY }} >> /tmp/.env
115-
echo AWS_S3_TEST_BUCKET=${{ env.AWS_S3_TEST_BUCKET }} >> /tmp/.env
116-
echo AWS_REGION=${{ env.AWS_REGION }} >> /tmp/.env
117-
echo PG_PARQUET_TEST=${{ env.PG_PARQUET_TEST }} >> /tmp/.env
122+
# Set up test environments
123+
source .devcontainer/scripts/setup_test_envs.sh
118124
119125
# Start MinIO server
120-
export MINIO_ROOT_USER=${{ env.AWS_ACCESS_KEY_ID }}
121-
export MINIO_ROOT_PASSWORD=${{ env.AWS_SECRET_ACCESS_KEY }}
122-
minio server /tmp/minio-storage > /dev/null 2>&1 &
126+
bash .devcontainer/scripts/setup_minio.sh
123127
124-
# Set access key and create test bucket
125-
mc alias set local http://localhost:9000 ${{ env.AWS_ACCESS_KEY_ID }} ${{ env.AWS_SECRET_ACCESS_KEY }}
126-
aws --endpoint-url http://localhost:9000 s3 mb s3://${{ env.AWS_S3_TEST_BUCKET }}
128+
# Start Azurite server
129+
bash .devcontainer/scripts/setup_azurite.sh
127130
128131
# Run tests with coverage tool
129132
source <(cargo llvm-cov show-env --export-prefix)
@@ -134,13 +137,9 @@ jobs:
134137
135138
# Stop MinIO server
136139
pkill -9 minio
137-
env:
138-
RUST_TEST_THREADS: 1
139-
AWS_ACCESS_KEY_ID: test_secret_access_key
140-
AWS_SECRET_ACCESS_KEY: test_access_key_id
141-
AWS_REGION: us-east-1
142-
AWS_S3_TEST_BUCKET: testbucket
143-
PG_PARQUET_TEST: true
140+
141+
# Stop Azurite server
142+
pkill -9 node
144143
145144
- name: Upload coverage report to Codecov
146145
if: ${{ env.PG_MAJOR }} == 17

Cargo.lock

Lines changed: 0 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,8 @@ arrow = {version = "53", default-features = false}
2222
arrow-schema = {version = "53", default-features = false}
2323
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
2424
aws-credential-types = {version = "1.2", default-features = false}
25-
dotenvy = "0.15"
2625
futures = "0.3"
27-
object_store = {version = "0.11", default-features = false, features = ["aws"]}
26+
object_store = {version = "0.11", default-features = false, features = ["aws", "azure"]}
2827
once_cell = "1"
2928
parquet = {version = "53", default-features = false, features = [
3029
"arrow",

README.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,13 @@ You can call `SELECT * FROM parquet.file_metadata(<uri>)` to discover file level
106106
You can call `SELECT * FROM parquet.kv_metadata(<uri>)` to query custom key-value metadata of the Parquet file at given uri.
107107

108108
## Object Store Support
109-
`pg_parquet` supports reading and writing Parquet files from/to `S3` object store. Only the uris with `s3://` scheme is supported.
109+
`pg_parquet` supports reading and writing Parquet files from/to `S3` and `Azure Blob Storage` object stores.
110+
111+
> [!NOTE]
112+
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
113+
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
114+
115+
#### S3 Storage
110116

111117
The simplest way to configure object storage is by creating the standard `~/.aws/credentials` and `~/.aws/config` files:
112118

@@ -129,9 +135,20 @@ Alternatively, you can use the following environment variables when starting pos
129135
- `AWS_CONFIG_FILE`: an alternative location for the config file
130136
- `AWS_PROFILE`: the name of the profile from the credentials and config file (default profile name is `default`)
131137

132-
> [!NOTE]
133-
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
134-
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
138+
Supported S3 uri formats are shown below:
139+
- s3:// \<bucket\> / \<path\>
140+
- s3a:// \<bucket\> / \<path\>
141+
- https:// \<bucket\>.s3.amazonaws.com / \<path\>
142+
- https:// s3.amazonaws.com / \<bucket\> / \<path\>
143+
144+
#### Azure Blob Storage
145+
146+
You can use the following environment variables when starting postgres to configure the Azure Blob Storage client:
147+
- `AZURE_STORAGE_ACCOUNT_KEY`: the storage account key of the Azure Blob
148+
- `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob
149+
150+
Supported Azure Blob Storage uri formats are shown below:
151+
- https:// \<account\>.blob.core.windows.net / \<container\> / \<path\>
135152

136153
## Copy Options
137154
`pg_parquet` supports the following options in the `COPY TO` command:

0 commit comments

Comments
 (0)