Skip to content

Commit 2feb683

Browse files
committed
Adds support for COPY TO/FROM Azure Blob Storage
Supports following Azure Blob uri forms: - `az://{container}/key` - `azure://{container}/key` - `https://{account}.blob.core.windows.net/{container}/key` **Configuration** The simplest way to configure object storage is by creating the standard [`~/.azure/config`](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest) file: ```bash $ cat ~/.azure/config [storage] account = devstoreaccount1 key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== ``` Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client: - `AZURE_STORAGE_ACCOUNT`: the storage account name of the Azure Blob - `AZURE_STORAGE_KEY`: the storage key of the Azure Blob - `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob - `AZURE_CONFIG_FILE`: an alternative location for the config file **Bonus** Additionally, PR supports following S3 uri forms: - `s3://{bucket}/key` - `s3a://{bucket}/key` - `https://s3.amazonaws.com/{bucket}/key` - `https://{bucket}.s3.amazonaws.com/key` Closes #50
1 parent 78fc489 commit 2feb683

File tree

12 files changed

+560
-104
lines changed

12 files changed

+560
-104
lines changed

.devcontainer/Dockerfile

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,11 @@ ENV TZ="Europe/Istanbul"
66
ARG PG_MAJOR=17
77

88
# install deps
9-
RUN apt-get update && apt-get -y install build-essential libreadline-dev zlib1g-dev \
10-
flex bison libxml2-dev libxslt-dev libssl-dev \
11-
libxml2-utils xsltproc ccache pkg-config wget \
12-
curl lsb-release sudo nano net-tools git awscli
9+
RUN apt-get update && apt-get -y install build-essential libreadline-dev zlib1g-dev \
10+
flex bison libxml2-dev libxslt-dev libssl-dev \
11+
libxml2-utils xsltproc ccache pkg-config wget \
12+
curl lsb-release ca-certificates gnupg sudo git \
13+
nano net-tools awscli
1314

1415
# install Postgres
1516
RUN sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
@@ -19,6 +20,14 @@ RUN apt-get update && apt-get -y install postgresql-${PG_MAJOR}-postgis-3 \
1920
postgresql-client-${PG_MAJOR} \
2021
libpq-dev
2122

23+
# install azure-cli and azurite
24+
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
25+
RUN apt-get update && apt-get install -y nodejs
26+
RUN curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
27+
RUN echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | tee /etc/apt/sources.list.d/azure-cli.list
28+
RUN apt-get update && apt-get install -y azure-cli
29+
RUN npm install -g azurite
30+
2231
# download and install MinIO server and client
2332
RUN wget https://dl.min.io/server/minio/release/linux-amd64/minio
2433
RUN chmod +x minio
@@ -58,11 +67,3 @@ ARG PGRX_VERSION=0.12.6
5867
RUN cargo install --locked cargo-pgrx@${PGRX_VERSION}
5968
RUN cargo pgrx init --pg${PG_MAJOR} $(which pg_config)
6069
RUN echo "shared_preload_libraries = 'pg_parquet'" >> $HOME/.pgrx/data-${PG_MAJOR}/postgresql.conf
61-
62-
ENV MINIO_ROOT_USER=admin
63-
ENV MINIO_ROOT_PASSWORD=admin123
64-
ENV AWS_S3_TEST_BUCKET=testbucket
65-
ENV AWS_REGION=us-east-1
66-
ENV AWS_ACCESS_KEY_ID=admin
67-
ENV AWS_SECRET_ACCESS_KEY=admin123
68-
ENV PG_PARQUET_TEST=true

.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
]
1616
}
1717
},
18-
"postStartCommand": "bash .devcontainer/scripts/setup-minio.sh",
18+
"postStartCommand": "bash .devcontainer/scripts/setup_minio.sh && bash .devcontainer/scripts/setup_azurite.sh",
1919
"forwardPorts": [
2020
5432
2121
],
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
3+
source setup_test_envs.sh
4+
5+
nohup azurite --location /tmp/azurite-storage > /dev/null 2>&1 &
6+
7+
az storage container create --name "${AZURE_TEST_CONTAINER_NAME}" --public off --connection-string "$AZURE_STORAGE_CONNECTION_STRING"
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#!/bin/bash
22

3+
source setup_test_envs.sh
4+
35
nohup minio server /tmp/minio-storage > /dev/null 2>&1 &
46

57
mc alias set local http://localhost:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# S3 tests
2+
export AWS_ACCESS_KEY_ID=admin
3+
export AWS_SECRET_ACCESS_KEY=admin123
4+
export AWS_REGION=us-east-1
5+
export AWS_S3_TEST_BUCKET=testbucket
6+
export MINIO_ROOT_USER=admin
7+
export MINIO_ROOT_PASSWORD=admin123
8+
9+
# Azure Blob tests
10+
export AZURE_STORAGE_ACCOUNT=devstoreaccount1
11+
export AZURE_STORAGE_KEY="Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
12+
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://localhost:10000/devstoreaccount1;"
13+
export AZURE_TEST_CONTAINER_NAME=testcontainer
14+
export AZURE_TEST_READ_ONLY_SAS="se=2100-05-05&sp=r&sv=2022-11-02&sr=c&sig=YMPFnAHKe9y0o3hFegncbwQTXtAyvsJEgPB2Ne1b9CQ%3D"
15+
export AZURE_TEST_READ_WRITE_SAS="se=2100-05-05&sp=rcw&sv=2022-11-02&sr=c&sig=TPz2jEz0t9L651t6rTCQr%2BOjmJHkM76tnCGdcyttnlA%3D"
16+
17+
# Other
18+
export PG_PARQUET_TEST=true
19+
export RUST_TEST_THREADS=1

.env_sample

Lines changed: 0 additions & 5 deletions
This file was deleted.

.github/workflows/ci.yml

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,23 @@ jobs:
7070
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
7171
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
7272
sudo apt-get update
73-
sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev libxml2-utils xsltproc ccache pkg-config
73+
sudo apt-get -y install build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev \
74+
libxslt-dev libssl-dev libxml2-utils xsltproc ccache pkg-config \
75+
gnupg ca-certificates
7476
sudo apt-get -y install postgresql-${{ env.PG_MAJOR }}-postgis-3 \
7577
postgresql-server-dev-${{ env.PG_MAJOR }} \
7678
postgresql-client-${{ env.PG_MAJOR }} \
7779
libpq-dev
7880
81+
- name: Install Azurite
82+
run: |
83+
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
84+
sudo apt-get update && sudo apt-get install -y nodejs
85+
curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
86+
echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
87+
sudo apt-get update && sudo apt-get install -y azure-cli
88+
npm install -g azurite
89+
7990
- name: Install MinIO
8091
run: |
8192
# Download and install MinIO server and client
@@ -108,23 +119,14 @@ jobs:
108119
$(pg_config --sharedir)/extension \
109120
/var/run/postgresql/
110121
111-
# pgrx tests with runas argument ignores environment variables, so
112-
# we read env vars from .env file in tests (https://github.com/pgcentralfoundation/pgrx/pull/1674)
113-
touch /tmp/.env
114-
echo AWS_ACCESS_KEY_ID=${{ env.AWS_ACCESS_KEY_ID }} >> /tmp/.env
115-
echo AWS_SECRET_ACCESS_KEY=${{ env.AWS_SECRET_ACCESS_KEY }} >> /tmp/.env
116-
echo AWS_S3_TEST_BUCKET=${{ env.AWS_S3_TEST_BUCKET }} >> /tmp/.env
117-
echo AWS_REGION=${{ env.AWS_REGION }} >> /tmp/.env
118-
echo PG_PARQUET_TEST=${{ env.PG_PARQUET_TEST }} >> /tmp/.env
122+
# Set up test environments
123+
source .devcontainer/scripts/setup_test_envs.sh
119124
120125
# Start MinIO server
121-
export MINIO_ROOT_USER=${{ env.AWS_ACCESS_KEY_ID }}
122-
export MINIO_ROOT_PASSWORD=${{ env.AWS_SECRET_ACCESS_KEY }}
123-
minio server /tmp/minio-storage > /dev/null 2>&1 &
126+
bash .devcontainer/scripts/setup_minio.sh
124127
125-
# Set access key and create test bucket
126-
mc alias set local http://localhost:9000 ${{ env.AWS_ACCESS_KEY_ID }} ${{ env.AWS_SECRET_ACCESS_KEY }}
127-
aws --endpoint-url http://localhost:9000 s3 mb s3://${{ env.AWS_S3_TEST_BUCKET }}
128+
# Start Azurite server
129+
bash .devcontainer/scripts/setup_azurite.sh
128130
129131
# Run tests with coverage tool
130132
source <(cargo llvm-cov show-env --export-prefix)
@@ -135,13 +137,9 @@ jobs:
135137
136138
# Stop MinIO server
137139
pkill -9 minio
138-
env:
139-
RUST_TEST_THREADS: 1
140-
AWS_ACCESS_KEY_ID: test_secret_access_key
141-
AWS_SECRET_ACCESS_KEY: test_access_key_id
142-
AWS_REGION: us-east-1
143-
AWS_S3_TEST_BUCKET: testbucket
144-
PG_PARQUET_TEST: true
140+
141+
# Stop Azurite server
142+
pkill -9 node
145143
146144
- name: Upload coverage report to Codecov
147145
if: ${{ env.PG_MAJOR }} == 17

Cargo.lock

Lines changed: 35 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ arrow = {version = "53", default-features = false}
2424
arrow-schema = {version = "53", default-features = false}
2525
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
2626
aws-credential-types = {version = "1.2", default-features = false}
27-
dotenvy = "0.15"
2827
futures = "0.3"
29-
object_store = {version = "0.11", default-features = false, features = ["aws"]}
28+
home = "0.5"
29+
object_store = {version = "0.11", default-features = false, features = ["aws", "azure"]}
3030
once_cell = "1"
3131
parquet = {version = "53", default-features = false, features = [
3232
"arrow",
@@ -38,6 +38,7 @@ parquet = {version = "53", default-features = false, features = [
3838
"object_store",
3939
]}
4040
pgrx = "=0.12.6"
41+
rust-ini = "0.21"
4142
tokio = {version = "1", default-features = false, features = ["rt", "time", "macros"]}
4243
url = "2.5"
4344

README.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,13 @@ SELECT uri, encode(key, 'escape') as key, encode(value, 'escape') as value FROM
155155
```
156156

157157
## Object Store Support
158-
`pg_parquet` supports reading and writing Parquet files from/to `S3` object store. Only the uris with `s3://` scheme is supported.
158+
`pg_parquet` supports reading and writing Parquet files from/to `S3` and `Azure Blob Storage` object stores.
159+
160+
> [!NOTE]
161+
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
162+
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
163+
164+
#### S3 Storage
159165

160166
The simplest way to configure object storage is by creating the standard `~/.aws/credentials` and `~/.aws/config` files:
161167

@@ -178,9 +184,33 @@ Alternatively, you can use the following environment variables when starting pos
178184
- `AWS_CONFIG_FILE`: an alternative location for the config file
179185
- `AWS_PROFILE`: the name of the profile from the credentials and config file (default profile name is `default`)
180186

181-
> [!NOTE]
182-
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
183-
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
187+
Supported S3 uri formats are shown below:
188+
- s3:// \<bucket\> / \<path\>
189+
- s3a:// \<bucket\> / \<path\>
190+
- https:// \<bucket\>.s3.amazonaws.com / \<path\>
191+
- https:// s3.amazonaws.com / \<bucket\> / \<path\>
192+
193+
#### Azure Blob Storage
194+
195+
The simplest way to configure object storage is by creating the standard [`~/.azure/config`](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest) file:
196+
197+
```bash
198+
$ cat ~/.azure/config
199+
[storage]
200+
account = devstoreaccount1
201+
key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
202+
```
203+
204+
Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client:
205+
- `AZURE_STORAGE_ACCOUNT`: the storage account name of the Azure Blob
206+
- `AZURE_STORAGE_KEY`: the storage key of the Azure Blob
207+
- `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob
208+
- `AZURE_CONFIG_FILE`: an alternative location for the config file
209+
210+
Supported Azure Blob Storage uri formats are shown below:
211+
- az:// \<container\> / \<path\>
212+
- azure:// \<container\> / \<path\>
213+
- https:// \<account\>.blob.core.windows.net / \<container\> / \<path\>
184214

185215
## Copy Options
186216
`pg_parquet` supports the following options in the `COPY TO` command:

0 commit comments

Comments
 (0)