dask.array.isin

dask.array.isin#

dask.array.isin(element, test_elements, assume_unique=False, invert=False, *, kind=None)#

Calculates element in test_elements, broadcasting over element only. Returns a boolean array of the same shape as element that is True where an element of element is in test_elements and False otherwise.

Parameters:

elementarray_like

Input array.

test_elementsarray_like

The values against which to test each value of element. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.

assume_uniquebool, optional

If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.

invertbool, optional

If True, the values in the returned array are inverted, as if calculating element not in test_elements. Default is False. np.isin(a, b, invert=True) is equivalent to (but faster than) np.invert(np.isin(a, b)).

kind{None, ‘sort’, ‘table’}, optional

The algorithm to use. This will not affect the final result, but will affect the speed and memory use. The default, None, will select automatically based on memory considerations.

If ‘sort’, will use a mergesort-based approach. This will have a memory usage of roughly 6 times the sum of the sizes of element and test_elements, not accounting for size of dtypes.
If ‘table’, will use a lookup table approach similar to a counting sort. This is only available for boolean and integer arrays. This will have a memory usage of the size of element plus the max-min value of test_elements. assume_unique has no effect when the ‘table’ option is used.
If None, will automatically choose ‘table’ if the required memory allocation is less than or equal to 6 times the sum of the sizes of element and test_elements, otherwise will use ‘sort’. This is done to not use a large amount of memory by default, even though ‘table’ may be faster in most cases. If ‘table’ is chosen, assume_unique will have no effect.

Returns:

isinndarray, bool: Has the same shape as element. The values element[isin] are in test_elements.

Notes

isin is an element-wise function version of the python keyword in. isin(a, b) is roughly equivalent to np.array([item in b for item in a]) if a and b are 1-D sequences.

element and test_elements are converted to arrays if they are not already. If test_elements is a set (or other non-sequence collection) it will be converted to an object array with one element, rather than an array of the values contained in test_elements. This is a consequence of the array constructor’s way of handling non-sequence collections. Converting the set to a list usually gives the desired behavior.

Using kind='table' tends to be faster than kind=’sort’ if the following relationship is true: log10(len(test_elements)) > (log10(max(test_elements)-min(test_elements)) - 2.27) / 0.927, but may use greater memory. The default value for kind will be automatically selected based only on memory usage, so one may manually set kind='table' if memory constraints can be relaxed.

Examples

>>> import numpy as np
>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
       [4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[False,  True],
       [ True, False]])
>>> element[mask]
array([2, 4])

The indices of the matched values can be obtained with nonzero:

>>> np.nonzero(mask)
(array([0, 1]), array([1, 0]))

The test can also be inverted:

>>> mask = np.isin(element, test_elements, invert=True)
>>> mask
array([[ True, False],
       [False,  True]])
>>> element[mask]
array([0, 6])

Because of how array handles sets, the following does not work as expected:

>>> test_set = {1, 2, 4, 8}
>>> np.isin(element, test_set)
array([[False, False],
       [False, False]])

Casting the set to a list gives the expected result:

>>> np.isin(element, list(test_set))
array([[False,  True],
       [ True, False]])

dask.array.isin

Contents

dask.array.isin#