dask.dataframe.map_partitions

dask.dataframe.map_partitions#

dask.dataframe.map_partitions(func, *args, meta=<no_default>, enforce_metadata=True, transform_divisions=True, clear_divisions=False, align_dataframes=False, parent_meta=None, required_columns=None, **kwargs)[source]#

Apply Python function on each DataFrame partition.

Parameters:

funcfunction

Function applied to each partition.

args, kwargs

Arguments and keywords to pass to the function. At least one of the args should be a Dask.dataframe. Arguments and keywords may contain Scalar, Delayed or regular python objects. DataFrame-like args (both dask and pandas) will be repartitioned to align (if necessary) before applying the function (see align_dataframes to control).

enforce_metadatabool, default True

Whether to enforce at runtime that the structure of the DataFrame produced by func actually matches the structure of meta. This will rename and reorder columns for each partition, and will raise an error if this doesn’t work, but it won’t raise if dtypes don’t match.

transform_divisionsbool, default True

Whether to apply the function onto the divisions and apply those transformed divisions to the output.

align_dataframesbool, default True

Whether to repartition DataFrame- or Series-like args (both dask and pandas) so their divisions align before applying the function. This requires all inputs to have known divisions. Single-partition inputs will be split into multiple partitions.

If False, all inputs must have either the same number of partitions or a single partition. Single-partition inputs will be broadcast to every partition of multi-partition inputs.

required_columnslist or None, default None

List of columns that func requires for execution. These columns must belong to the first DataFrame argument (in args). If None is specified (the default), the query optimizer will assume that all input columns are required.

metapd.DataFrame, pd.Series, dict, iterable, tuple, optional

An empty pd.DataFrame or pd.Series that matches the dtypes and column names of the output. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of a DataFrame, a dict of {name: dtype} or iterable of (name, dtype) can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of (name, dtype) can be used. If not provided, dask will try to infer the metadata. This may lead to unexpected results, so providing meta is recommended. For more information, see dask.dataframe.utils.make_meta.

dask.dataframe.map_partitions

Contents

dask.dataframe.map_partitions#