dask.array.histogram
dask.array.histogram¶
- dask.array.histogram(a, bins=None, range=None, normed=False, weights=None, density=None)[source]¶
Blocked variant of
numpy.histogram()
.- Parameters
- adask.array.Array
Input data; the histogram is computed over the flattened array. If the
weights
argument is used, the chunks ofa
are accessed to check chunking compatibility betweena
andweights
. Ifweights
isNone
, adask.dataframe.Series
object can be passed as input data.- binsint or sequence of scalars, optional
Either an iterable specifying the
bins
or the number ofbins
and arange
argument is required as computingmin
andmax
over blocked arrays is an expensive operation that must be performed explicitly. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.- range(float, float), optional
The lower and upper range of the bins. If not provided, range is simply
(a.min(), a.max())
. Values outside the range are ignored. The first element of the range must be less than or equal to the second. range affects the automatic bin computation as well. While bin width is computed to be optimal based on the actual data within range, the bin count will fill the entire range including portions containing no data.- normedbool, optional
This is equivalent to the
density
argument, but produces incorrect results for unequal bin widths. It should not be used.- weightsdask.array.Array, optional
A dask.array.Array of weights, of the same block structure as
a
. Each value ina
only contributes its associated weight towards the bin count (instead of 1). Ifdensity
is True, the weights are normalized, so that the integral of the density over the range remains 1.- densitybool, optional
If
False
, the result will contain the number of samples in each bin. IfTrue
, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function. Overrides thenormed
keyword if given. Ifdensity
is True,bins
cannot be a single-number delayed value. It must be a concrete number, or a (possibly-delayed) array/sequence of the bin edges.
- Returns
- histdask Array
The values of the histogram. See density and weights for a description of the possible semantics.
- bin_edgesdask Array of dtype float
Return the bin edges
(length(hist)+1)
.
Examples
Using number of bins and range:
>>> import dask.array as da >>> import numpy as np >>> x = da.from_array(np.arange(10000), chunks=10) >>> h, bins = da.histogram(x, bins=10, range=[0, 10000]) >>> bins array([ 0., 1000., 2000., 3000., 4000., 5000., 6000., 7000., 8000., 9000., 10000.]) >>> h.compute() array([1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000])
Explicitly specifying the bins:
>>> h, bins = da.histogram(x, bins=np.array([0, 5000, 10000])) >>> bins array([ 0, 5000, 10000]) >>> h.compute() array([5000, 5000])