Open
Description
Zarr version
v3.0.8
Numcodecs version
v0.16.1
Python Version
3.12
Operating System
Linux (WSL2)
Installation
pip
Description
The RAM consumption when reading a local Zarr array created with the shards
option depends on the indexing method. Selecting the same data via a slice or via a list of indices yields dramatically different memory usage. If I omit the shards
parameter when calling create_array
, the memory usage for both selection methods becomes similar.


Steps to reproduce
import zarr
import numpy as np
N_ROWS = 10_000
N_COLS = 2_000
CHUNK_ROW = 5_000
CHUNK_COL = 100
ARRAY_PATH = "/tmp/tmp.zarr"
# create array as
rng = np.random.default_rng(seed=42)
x = rng.random((N_ROWS, N_COLS))
array = zarr.create_array(
store=ARRAY_PATH,
data=x,
chunks=(CHUNK_ROW, CHUNK_COL),
shards=(CHUNK_ROW, 500),
overwrite=True,
)
# read data
array = zarr.open_array(ARRAY_PATH)
x = array[slice(0, CHUNK_ROW)] # RAM consumption 50 MiB
y = array[list(range(CHUNK_ROW))] # RAM consumption 800 MiB
Additional output
No response