dask.dataframe.DataFrame.shuffle

dask.dataframe.DataFrame.shuffle#

DataFrame.shuffle(on: str | list | <no_default> = <no_default>, ignore_index: bool = False, npartitions: int | None = None, shuffle_method: str | None = None, on_index: bool = False, force: bool = False, **options)#

Rearrange DataFrame into new partitions

Uses hashing of on to map rows to output partitions. After this operation, rows with the same value of on will be in the same partition.

Parameters:

onstr, list of str, or Series, Index, or DataFrame: Column names to shuffle by.
ignore_indexoptional: Whether to ignore the index. Default is False.
npartitionsoptional: Number of output partitions. The partition count will be preserved by default.
shuffle_methodoptional: Desired shuffle method. Default chosen at optimization time.
on_indexbool, default False: Whether to shuffle on the index. Mutually exclusive with ‘on’. Set this to True if ‘on’ is not provided.
forcebool, default False: This forces the optimizer to keep the shuffle even if the final expression could be further simplified.
**optionsoptional: Algorithm-specific options.

Notes

This does not preserve a meaningful index/partitioning scheme. This is not deterministic if done in parallel.

Examples

>>> df = df.shuffle(df.columns[0])

dask.dataframe.DataFrame.shuffle

Contents

dask.dataframe.DataFrame.shuffle#