dask.dataframe.DataFrame.query
dask.dataframe.DataFrame.query¶
- DataFrame.query(expr, **kwargs)[source]¶
Filter dataframe with complex expression
Blocked version of pd.DataFrame.query
- Parameters
- expr: str
The query string to evaluate. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Dask does not fully support referring to variables using the ‘@’ character, use f-strings or the
local_dict
keyword argument instead.
See also
Notes
This is like the sequential version except that this will also happen in many threads. This may conflict with
numexpr
which will use multiple threads itself. We recommend that you setnumexpr
to use a single thread:import numexpr numexpr.set_num_threads(1)
Examples
>>> import pandas as pd >>> import dask.dataframe as dd >>> df = pd.DataFrame({'x': [1, 2, 1, 2], ... 'y': [1, 2, 3, 4], ... 'z z': [4, 3, 2, 1]}) >>> ddf = dd.from_pandas(df, npartitions=2)
Refer to column names directly:
>>> ddf.query('y > x').compute() x y z z 2 1 3 2 3 2 4 1
Refer to column name using backticks:
>>> ddf.query('`z z` > x').compute() x y z z 0 1 1 4 1 2 2 3 2 1 3 2
Refer to variable name using f-strings:
>>> value = 1 >>> ddf.query(f'x == {value}').compute() x y z z 0 1 1 4 2 1 3 2
Refer to variable name using
local_dict
:>>> ddf.query('x == @value', local_dict={"value": value}).compute() x y z z 0 1 1 4 2 1 3 2