dask_expr._collection.DataFrame.sort_values

dask_expr._collection.DataFrame.sort_values

DataFrame.sort_values(by: str | list[str], npartitions: int | None = None, ascending: bool | list[bool] = True, na_position: Union[Literal['first'], Literal['last']] = 'last', partition_size: float = 128000000.0, sort_function: collections.abc.Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame] | None = None, sort_function_kwargs: collections.abc.Mapping[str, Any] | None = None, upsample: float = 1.0, ignore_index: bool | None = False, shuffle_method: str | None = None, **options)[source]

Sort the dataset by a single column.

Sorting a parallel dataset requires expensive shuffles and is generally not recommended. See set_index for implementation details.

Parameters
by: str or list[str]

Column(s) to sort by.

npartitions: int, None, or ‘auto’

The ideal number of output partitions. If None, use the same as the input. If ‘auto’ then decide by memory use.

ascending: bool, optional

Sort ascending vs. descending. Defaults to True.

na_position: {‘last’, ‘first’}, optional

Puts NaNs at the beginning if ‘first’, puts NaN at the end if ‘last’. Defaults to ‘last’.

sort_function: function, optional

Sorting function to use when sorting underlying partitions. If None, defaults to M.sort_values (the partition library’s implementation of sort_values).

sort_function_kwargs: dict, optional

Additional keyword arguments to pass to the partition sorting function. By default, by, ascending, and na_position are provided.

Examples

>>> df2 = df.sort_values('x')