dask.array.digitize

dask.array.digitize

dask.array.digitize(a, bins, right=False)[source]

Return the indices of the bins to which each value in input array belongs.

This docstring was copied from numpy.digitize.

Some inconsistencies with the Dask version may exist.

right

order of bins

returned index i satisfies

False

increasing

bins[i-1] <= x < bins[i]

True

increasing

bins[i-1] < x <= bins[i]

False

decreasing

bins[i-1] > x >= bins[i]

True

decreasing

bins[i-1] >= x > bins[i]

If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.

Parameters
xarray_like (Not supported in Dask)

Input array to be binned. Prior to NumPy 1.10.0, this array had to be 1-dimensional, but can now have any shape.

binsarray_like

Array of bins. It has to be 1-dimensional and monotonic.

rightbool, optional

Indicating whether the intervals include the right or the left bin edge. Default behavior is (right==False) indicating that the interval does not include the right edge. The left bin end is open in this case, i.e., bins[i-1] <= x < bins[i] is the default behavior for monotonically increasing bins.

Returns
indicesndarray of ints

Output array of indices, of same shape as x.

Raises
ValueError

If bins is not monotonic.

TypeError

If the type of the input is complex.

Notes

If values in x are such that they fall outside the bin range, attempting to index bins with the indices that digitize returns will result in an IndexError.

New in version 1.10.0.

numpy.digitize is implemented in terms of numpy.searchsorted. This means that a binary search is used to bin the values, which scales much better for larger number of bins than the previous linear search. It also removes the requirement for the input array to be 1-dimensional.

For monotonically increasing bins, the following are equivalent:

np.digitize(x, bins, right=True)
np.searchsorted(bins, x, side='left')

Note that as the order of the arguments are reversed, the side must be too. The searchsorted call is marginally faster, as it does not do any monotonicity checks. Perhaps more importantly, it supports all dtypes.

Examples

>>> import numpy as np  
>>> x = np.array([0.2, 6.4, 3.0, 1.6])  
>>> bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])  
>>> inds = np.digitize(x, bins)  
>>> inds  
array([1, 4, 3, 2])
>>> for n in range(x.size):  
...   print(bins[inds[n]-1], "<=", x[n], "<", bins[inds[n]])
...
0.0 <= 0.2 < 1.0
4.0 <= 6.4 < 10.0
2.5 <= 3.0 < 4.0
1.0 <= 1.6 < 2.5
>>> x = np.array([1.2, 10.0, 12.4, 15.5, 20.])  
>>> bins = np.array([0, 5, 10, 15, 20])  
>>> np.digitize(x,bins,right=True)  
array([1, 2, 3, 4, 4])
>>> np.digitize(x,bins,right=False)  
array([1, 3, 3, 4, 5])