dask_expr._collection.DataFrame.shuffle

dask_expr._collection.DataFrame.shuffle¶

DataFrame.shuffle(on: str | list | no_default = _NoDefault.no_default, ignore_index: bool = False, npartitions: int | None = None, shuffle_method: str | None = None, on_index: bool = False, **options)¶

Rearrange DataFrame into new partitions

Uses hashing of on to map rows to output partitions. After this operation, rows with the same value of on will be in the same partition.

Parameters

onstr, list of str, or Series, Index, or DataFrame: Column names to shuffle by.
ignore_indexoptional: Whether to ignore the index. Default is False.
npartitionsoptional: Number of output partitions. The partition count will be preserved by default.
shuffle_methodoptional: Desired shuffle method. Default chosen at optimization time.
on_indexbool, default False: Whether to shuffle on the index. Mutually exclusive with ‘on’. Set this to True if ‘on’ is not provided.
**optionsoptional: Algorithm-specific options.

Notes

This does not preserve a meaningful index/partitioning scheme. This is not deterministic if done in parallel.

Examples

>>> df = df.shuffle(df.columns[0])  

dask_expr._collection.DataFrame.shape

dask_expr._collection.DataFrame.size