Dataset.to_zarr(store=None, chunk_store=None, mode=None, synchronizer=None, group=None, encoding=None, compute=True, consolidated=False, append_dim=None, region=None, safe_chunks=True)[source]

Write dataset contents to a zarr group.

Zarr chunks are determined in the following way:

  • From the chunks attribute in each variable’s encoding

  • If the variable is a Dask array, from the dask chunks

  • If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr

  • If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a ValueError. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, use safe_chunks=False.

  • store (MutableMapping, str or Path, optional) – Store or path to directory in file system.

  • chunk_store (MutableMapping, str or Path, optional) – Store or path to directory in file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.

  • mode ({"w", "w-", "a", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist). If append_dim is set, mode can be omitted as it is internally set to "a". Otherwise, mode will default to w- if not set.

  • synchronizer (object, optional) – Zarr array synchronizer.

  • group (str, optional) – Group path. (a.k.a. path in zarr terminology.)

  • encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}

  • compute (bool, optional) – If True write array data immediately, otherwise return a dask.delayed.Delayed object that can be computed to write array data later. Metadata is always updated eagerly.

  • consolidated (bool, optional) – If True, apply zarr’s consolidate_metadata function to the store after writing metadata.

  • append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overriden variables must remain the same size.

  • region (dict, optional) – Optional mapping from dimension names to integer slices along dataset dimensions to indicate the region of existing zarr array(s) in which to write this dataset’s data. For example, {'x': slice(0, 1000), 'y': slice(10000, 11000)} would indicate that values should be written to the region 0:1000 along x and 10000:11000 along y.

    Two restrictions apply to the use of region:

    • If region is set, _all_ variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate call to to_zarr().

    • Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use a separate call to to_zarr() with compute=False. See “Appending to existing Zarr stores” in the reference documentation for full details.

  • safe_chunks (bool, optional) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination with compute=False to initialize a Zarr from an existing Dataset with aribtrary chunk structure.




Zarr chunking behavior:

If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.

See also


The I/O user guide, with more details and examples.