dask.array.rechunk
dask.array.rechunk¶
- dask.array.rechunk(x, chunks='auto', threshold=None, block_size_limit=None, balance=False, method=None)[source]¶
Convert blocks in dask array x for new chunks.
- Parameters
- x: dask array
Array to be rechunked.
- chunks: int, tuple, dict or str, optional
The new block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is “auto” which automatically determines chunk sizes.
- threshold: int, optional
The graph growth factor under which we don’t bother introducing an intermediate step.
- block_size_limit: int, optional
The maximum block size (in bytes) we want to produce Defaults to the configuration value
array.chunk-size
- balancebool, default False
If True, try to make each chunk to be the same size.
This means
balance=True
will remove any small leftover chunks, so usingx.rechunk(chunks=len(x) // N, balance=True)
will almost certainly result inN
chunks.- method: {‘tasks’, ‘p2p’}, optional.
Rechunking method to use.
Examples
>>> import dask.array as da >>> x = da.ones((1000, 1000), chunks=(100, 100))
Specify uniform chunk sizes with a tuple
>>> y = x.rechunk((1000, 10))
Or chunk only specific dimensions with a dictionary
>>> y = x.rechunk({0: 1000})
Use the value
-1
to specify that you want a single chunk along a dimension or the value"auto"
to specify that dask can freely rechunk a dimension to attain blocks of a uniform block size>>> y = x.rechunk({0: -1, 1: 'auto'}, block_size_limit=1e8)
If a chunk size does not divide the dimension then rechunk will leave any unevenness to the last chunk.
>>> x.rechunk(chunks=(400, -1)).chunks ((400, 400, 200), (1000,))
However if you want more balanced chunks, and don’t mind Dask choosing a different chunksize for you then you can use the
balance=True
option.>>> x.rechunk(chunks=(400, -1), balance=True).chunks ((500, 500), (1000,))