dask_expr._collection.DataFrame.shuffle
dask_expr._collection.DataFrame.shuffle¶
- DataFrame.shuffle(on: str | list | no_default = _NoDefault.no_default, ignore_index: bool = False, npartitions: int | None = None, shuffle_method: str | None = None, on_index: bool = False, **options)¶
Rearrange DataFrame into new partitions
Uses hashing of on to map rows to output partitions. After this operation, rows with the same value of on will be in the same partition.
- Parameters
- onstr, list of str, or Series, Index, or DataFrame
Column names to shuffle by.
- ignore_indexoptional
Whether to ignore the index. Default is
False
.- npartitionsoptional
Number of output partitions. The partition count will be preserved by default.
- shuffle_methodoptional
Desired shuffle method. Default chosen at optimization time.
- on_indexbool, default False
Whether to shuffle on the index. Mutually exclusive with ‘on’. Set this to
True
if ‘on’ is not provided.- **optionsoptional
Algorithm-specific options.
Notes
This does not preserve a meaningful index/partitioning scheme. This is not deterministic if done in parallel.
Examples
>>> df = df.shuffle(df.columns[0])