dask.dataframe.DataFrame.divisions#
- property DataFrame.divisions#
Tuple of
npartitions + 1values, in ascending order, marking the lower/upper bounds of each partition’s index. Divisions allow Dask to know which partition will contain a given value, significantly speeding up operations like loc, merge, and groupby by not having to search the full dataset.Example: for
divisions = (0, 10, 50, 100), there are three partitions, where the index in each partition contains values [0, 10), [10, 50), and [50, 100], respectively. Dask therefore knowsdf.loc[45]will be in the second partition.When every item in
divisionsisNone, the divisions are unknown. Most operations can still be performed, but some will be much slower, and a few may fail.It is not supported to set
divisionsdirectly. Instead, useset_index, which sorts and splits the data as needed. See https://docs.dask.org/en/latest/dataframe-design.html#partitions.