dask.dataframe.DataFrame.query

dask.dataframe.DataFrame.query

DataFrame.query(expr, **kwargs)[source]

Filter dataframe with complex expression

Blocked version of pd.DataFrame.query

Parameters
expr: str

The query string to evaluate. You can refer to column names that are not valid Python variable names by surrounding them in backticks. Dask does not fully support referring to variables using the ‘@’ character, use f-strings or the local_dict keyword argument instead.

Notes

This is like the sequential version except that this will also happen in many threads. This may conflict with numexpr which will use multiple threads itself. We recommend that you set numexpr to use a single thread:

import numexpr
numexpr.set_num_threads(1)

Examples

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> df = pd.DataFrame({'x': [1, 2, 1, 2],
...                    'y': [1, 2, 3, 4],
...                    'z z': [4, 3, 2, 1]})
>>> ddf = dd.from_pandas(df, npartitions=2)

Refer to column names directly:

>>> ddf.query('y > x').compute()
   x  y  z z
2  1  3    2
3  2  4    1

Refer to column name using backticks:

>>> ddf.query('`z z` > x').compute()
   x  y  z z
0  1  1    4
1  2  2    3
2  1  3    2

Refer to variable name using f-strings:

>>> value = 1
>>> ddf.query(f'x == {value}').compute()
   x  y  z z
0  1  1    4
2  1  3    2

Refer to variable name using local_dict:

>>> ddf.query('x == @value', local_dict={"value": value}).compute()
   x  y  z z
0  1  1    4
2  1  3    2