Skip to content

Commit afb3c71

Browse files
committed
Adds support for COPY TO/FROM Azure Blob Storage
Supports following Azure Blob uri forms: - `az://{container}/key` - `azure://{container}/key` - `https://{account}.blob.core.windows.net/{container}/key` **Configuration** The simplest way to configure object storage is by creating the standard [`~/.azure/config`](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest) file: ```bash $ cat ~/.azure/config [storage] account = devstoreaccount1 key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== ``` Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client: - `AZURE_STORAGE_ACCOUNT`: the storage account name of the Azure Blob - `AZURE_STORAGE_KEY`: the storage key of the Azure Blob - `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob - `AZURE_CONFIG_FILE`: an alternative location for the config file **Bonus** Additionally, PR supports following S3 uri forms: - `s3://{bucket}/key` - `s3a://{bucket}/key` - `https://s3.amazonaws.com/{bucket}/key` - `https://{bucket}.s3.amazonaws.com/key` Closes #50
1 parent 1660ecf commit afb3c71

File tree

10 files changed

+548
-50
lines changed

10 files changed

+548
-50
lines changed

.devcontainer/.env

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@ AWS_S3_TEST_BUCKET=testbucket
66
MINIO_ROOT_USER=minioadmin
77
MINIO_ROOT_PASSWORD=minioadmin
88

9+
# Azure Blob tests
10+
AZURE_STORAGE_ACCOUNT=devstoreaccount1
11+
AZURE_STORAGE_KEY="Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
12+
AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://localhost:10000/devstoreaccount1;"
13+
AZURE_TEST_CONTAINER_NAME=testcontainer
14+
AZURE_TEST_READ_ONLY_SAS="se=2100-05-05&sp=r&sv=2022-11-02&sr=c&sig=YMPFnAHKe9y0o3hFegncbwQTXtAyvsJEgPB2Ne1b9CQ%3D"
15+
AZURE_TEST_READ_WRITE_SAS="se=2100-05-05&sp=rcw&sv=2022-11-02&sr=c&sig=TPz2jEz0t9L651t6rTCQr%2BOjmJHkM76tnCGdcyttnlA%3D"
16+
917
# Others
1018
RUST_TEST_THREADS=1
1119
PG_PARQUET_TEST=true

.devcontainer/Dockerfile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ RUN apt-get update && apt-get -y install build-essential libreadline-dev zlib1g-
1212
curl lsb-release ca-certificates gnupg sudo git \
1313
nano net-tools awscli
1414

15+
# install azure-cli
16+
RUN curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
17+
RUN echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | tee /etc/apt/sources.list.d/azure-cli.list
18+
RUN apt-get update && apt-get install -y azure-cli
19+
1520
# install Postgres
1621
RUN sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
1722
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
#!/bin/bash
22

33
aws --endpoint-url http://localhost:9000 s3 mb s3://$AWS_S3_TEST_BUCKET
4+
5+
az storage container create -n $AZURE_TEST_CONTAINER_NAME --connection-string $AZURE_STORAGE_CONNECTION_STRING

.devcontainer/docker-compose.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,14 @@ services:
1111
- ${USERPROFILE}${HOME}/.ssh/known_hosts:/home/rust/.ssh/known_hosts:rw
1212
- ${USERPROFILE}${HOME}/.gitconfig:/home/rust/.gitconfig:ro
1313
- ${USERPROFILE}${HOME}/.aws:/home/rust/.aws:ro
14+
- ${USERPROFILE}${HOME}/.azure:/home/rust/.azure:ro
1415
env_file:
1516
- .env
1617
cap_add:
1718
- SYS_PTRACE
1819
depends_on:
1920
- minio
21+
- azurite
2022

2123
minio:
2224
image: minio/minio
@@ -30,3 +32,15 @@ services:
3032
interval: 6s
3133
timeout: 2s
3234
retries: 3
35+
36+
azurite:
37+
image: mcr.microsoft.com/azure-storage/azurite
38+
env_file:
39+
- .env
40+
network_mode: host
41+
restart: unless-stopped
42+
healthcheck:
43+
test: ["CMD", "curl", "http://localhost:10000"]
44+
interval: 6s
45+
timeout: 2s
46+
retries: 3

.github/workflows/ci.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ jobs:
8585
postgresql-client-${{ env.PG_MAJOR }} \
8686
libpq-dev
8787
88+
- name: Install azure-cli
89+
run: |
90+
curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
91+
echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/trusted.gpg.d/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ `lsb_release -cs` main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
92+
sudo apt-get update && sudo apt-get install -y azure-cli
8893
8994
- name: Install and configure pgrx
9095
run: |
@@ -116,6 +121,17 @@ jobs:
116121
117122
aws --endpoint-url http://localhost:9000 s3 mb s3://$AWS_S3_TEST_BUCKET
118123
124+
- name: Start Azurite for Azure Blob Storage emulator tests
125+
run: |
126+
docker run -d --env-file .devcontainer/.env -p 10000:10000 mcr.microsoft.com/azure-storage/azurite
127+
128+
while ! nc -z localhost 10000; do
129+
echo "Waiting for localhost:10000..."
130+
sleep 1
131+
done
132+
133+
az storage container create -n $AZURE_TEST_CONTAINER_NAME --connection-string $AZURE_STORAGE_CONNECTION_STRING
134+
119135
- name: Run tests
120136
run: |
121137
# Run tests with coverage tool

Cargo.lock

Lines changed: 38 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ arrow-schema = {version = "53", default-features = false}
2525
aws-config = { version = "1.5", default-features = false, features = ["rustls"]}
2626
aws-credential-types = {version = "1.2", default-features = false}
2727
futures = "0.3"
28-
object_store = {version = "0.11", default-features = false, features = ["aws"]}
28+
home = "0.5"
29+
object_store = {version = "0.11", default-features = false, features = ["aws", "azure"]}
2930
once_cell = "1"
3031
parquet = {version = "53", default-features = false, features = [
3132
"arrow",
@@ -37,6 +38,7 @@ parquet = {version = "53", default-features = false, features = [
3738
"object_store",
3839
]}
3940
pgrx = "=0.12.8"
41+
rust-ini = "0.21"
4042
tokio = {version = "1", default-features = false, features = ["rt", "time", "macros"]}
4143
url = "2"
4244

README.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,13 @@ SELECT uri, encode(key, 'escape') as key, encode(value, 'escape') as value FROM
155155
```
156156

157157
## Object Store Support
158-
`pg_parquet` supports reading and writing Parquet files from/to `S3` object store. Only the uris with `s3://` scheme is supported.
158+
`pg_parquet` supports reading and writing Parquet files from/to `S3` and `Azure Blob Storage` object stores.
159+
160+
> [!NOTE]
161+
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
162+
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
163+
164+
#### S3 Storage
159165

160166
The simplest way to configure object storage is by creating the standard `~/.aws/credentials` and `~/.aws/config` files:
161167

@@ -178,9 +184,33 @@ Alternatively, you can use the following environment variables when starting pos
178184
- `AWS_CONFIG_FILE`: an alternative location for the config file
179185
- `AWS_PROFILE`: the name of the profile from the credentials and config file (default profile name is `default`)
180186

181-
> [!NOTE]
182-
> To be able to write into a object store location, you need to grant `parquet_object_store_write` role to your current postgres user.
183-
> Similarly, to read from an object store location, you need to grant `parquet_object_store_read` role to your current postgres user.
187+
Supported S3 uri formats are shown below:
188+
- s3:// \<bucket\> / \<path\>
189+
- s3a:// \<bucket\> / \<path\>
190+
- https:// \<bucket\>.s3.amazonaws.com / \<path\>
191+
- https:// s3.amazonaws.com / \<bucket\> / \<path\>
192+
193+
#### Azure Blob Storage
194+
195+
The simplest way to configure object storage is by creating the standard [`~/.azure/config`](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest) file:
196+
197+
```bash
198+
$ cat ~/.azure/config
199+
[storage]
200+
account = devstoreaccount1
201+
key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
202+
```
203+
204+
Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client:
205+
- `AZURE_STORAGE_ACCOUNT`: the storage account name of the Azure Blob
206+
- `AZURE_STORAGE_KEY`: the storage key of the Azure Blob
207+
- `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob
208+
- `AZURE_CONFIG_FILE`: an alternative location for the config file
209+
210+
Supported Azure Blob Storage uri formats are shown below:
211+
- az:// \<container\> / \<path\>
212+
- azure:// \<container\> / \<path\>
213+
- https:// \<account\>.blob.core.windows.net / \<container\> / \<path\>
184214

185215
## Copy Options
186216
`pg_parquet` supports the following options in the `COPY TO` command:

0 commit comments

Comments
 (0)