dask_expr._collection.DataFrame.random_split

dask_expr._collection.DataFrame.random_split

DataFrame.random_split(frac, random_state=None, shuffle=False)

Pseudorandomly split dataframe into different pieces row-wise

Parameters
fraclist

List of floats that should sum to one.

random_stateint or np.random.RandomState

If int or None create a new RandomState with this as the seed. Otherwise draw from the passed RandomState.

shufflebool, default False

If set to True, the dataframe is shuffled (within partition) before the split.

See also

dask.DataFrame.sample

Examples

50/50 split

>>> a, b = df.random_split([0.5, 0.5])  

80/10/10 split, consistent random_state

>>> a, b, c = df.random_split([0.8, 0.1, 0.1], random_state=123)