dask.array.reduction
dask.array.reduction¶
- dask.array.reduction(x, chunk, aggregate, axis=None, keepdims=False, dtype=None, split_every=None, combine=None, name=None, out=None, concatenate=True, output_size=1, meta=None)[source]¶
General version of reductions
- Parameters
- x: Array
Data being reduced along one or more axes
- chunk: callable(x_chunk, axis, keepdims)
First function to be executed when resolving the dask graph. This function is applied in parallel to all original chunks of x. See below for function parameters.
- combine: callable(x_chunk, axis, keepdims), optional
Function used for intermediate recursive aggregation (see split_every below). If omitted, it defaults to aggregate. If the reduction can be performed in less than 3 steps, it will not be invoked at all.
- aggregate: callable(x_chunk, axis, keepdims)
Last function to be executed when resolving the dask graph, producing the final output. It is always invoked, even when the reduced Array counts a single chunk along the reduced axes.
- axis: int or sequence of ints, optional
Axis or axes to aggregate upon. If omitted, aggregate along all axes.
- keepdims: boolean, optional
Whether the reduction function should preserve the reduced axes, leaving them at size
output_size
, or remove them.- dtype: np.dtype
data type of output. This argument was previously optional, but leaving as
None
will now raise an exception.- split_every: int >= 2 or dict(axis: int), optional
Determines the depth of the recursive aggregation. If set to or more than the number of input chunks, the aggregation will be performed in two steps, one
chunk
function per input chunk and a singleaggregate
function at the end. If set to less than that, an intermediatecombine
function will be used, so that any onecombine
oraggregate
function has no more thansplit_every
inputs. The depth of the aggregation graph will be \(log_{split_every}(input chunks along reduced axes)\). Setting to a low value can reduce cache size and network transfers, at the cost of more CPU and a larger dask graph.Omit to let dask heuristically decide a good default. A default can also be set globally with the
split_every
key indask.config
.- name: str, optional
Prefix of the keys of the intermediate and output nodes. If omitted it defaults to the function names.
- out: Array, optional
Another dask array whose contents will be replaced. Omit to create a new one. Note that, unlike in numpy, this setting gives no performance benefits whatsoever, but can still be useful if one needs to preserve the references to a previously existing Array.
- concatenate: bool, optional
If True (the default), the outputs of the
chunk
/combine
functions are concatenated into a single np.array before being passed to thecombine
/aggregate
functions. If False, the input ofcombine
andaggregate
will be either a list of the raw outputs of the previous step or a single output, and the function will have to concatenate it itself. It can be useful to set this to False if the chunk and/or combine steps do not produce np.arrays.- output_size: int >= 1, optional
Size of the output of the
aggregate
function along the reduced axes. Ignored if keepdims is False.
- Returns
- dask array
- Function Parameters
- x_chunk: numpy.ndarray
Individual input chunk. For
chunk
functions, it is one of the original chunks of x. Forcombine
andaggregate
functions, it’s the concatenation of the outputs produced by the previouschunk
orcombine
functions. If concatenate=False, it’s a list of the raw outputs from the previous functions.- axis: tuple
Normalized list of axes to reduce upon, e.g.
(0, )
Scalar, negative, and None axes have been normalized away. Note that some numpy reduction functions cannot reduce along multiple axes at once and strictly require an int in input. Such functions have to be wrapped to cope.- keepdims: bool
Whether the reduction function should preserve the reduced axes or remove them.