dask_expr._collection.DataFrame.shuffle

dask_expr._collection.DataFrame.shuffle

DataFrame.shuffle(on: str | list | no_default = _NoDefault.no_default, ignore_index: bool = False, npartitions: int | None = None, shuffle_method: str | None = None, on_index: bool = False, **options)

Rearrange DataFrame into new partitions

Uses hashing of on to map rows to output partitions. After this operation, rows with the same value of on will be in the same partition.

Parameters
onstr, list of str, or Series, Index, or DataFrame

Column names to shuffle by.

ignore_indexoptional

Whether to ignore the index. Default is False.

npartitionsoptional

Number of output partitions. The partition count will be preserved by default.

shuffle_methodoptional

Desired shuffle method. Default chosen at optimization time.

on_indexbool, default False

Whether to shuffle on the index. Mutually exclusive with ‘on’. Set this to True if ‘on’ is not provided.

**optionsoptional

Algorithm-specific options.

Notes

This does not preserve a meaningful index/partitioning scheme. This is not deterministic if done in parallel.

Examples

>>> df = df.shuffle(df.columns[0])