dask.dataframe.rolling.map_overlap

dask.dataframe.rolling.map_overlap

dask.dataframe.rolling.map_overlap(func, df, before, after, *args, meta='__no_default__', enforce_metadata=True, transform_divisions=True, align_dataframes=True, **kwargs)[source]

Apply a function to each partition, sharing rows with adjacent partitions.

Parameters
funcfunction

The function applied to each partition. If this function accepts the special partition_info keyword argument, it will recieve information on the partition’s relative location within the dataframe.

df: dd.DataFrame, dd.Series
args, kwargs

Positional and keyword arguments to pass to the function. Positional arguments are computed on a per-partition basis, while keyword arguments are shared across all partitions. The partition itself will be the first positional argument, with all other arguments passed after. Arguments can be Scalar, Delayed, or regular Python objects. DataFrame-like args (both dask and pandas) will be repartitioned to align (if necessary) before applying the function; see align_dataframes to control this behavior.

enforce_metadatabool, default True

Whether to enforce at runtime that the structure of the DataFrame produced by func actually matches the structure of meta. This will rename and reorder columns for each partition, and will raise an error if this doesn’t work, but it won’t raise if dtypes don’t match.

beforeint or timedelta

The rows to prepend to partition i from the end of partition i - 1.

afterint or timedelta

The rows to append to partition i from the beginning of partition i + 1.

transform_divisionsbool, default True

Whether to apply the function onto the divisions and apply those transformed divisions to the output.

align_dataframesbool, default True

Whether to repartition DataFrame- or Series-like args (both dask and pandas) so their divisions align before applying the function. This requires all inputs to have known divisions. Single-partition inputs will be split into multiple partitions.

If False, all inputs must have either the same number of partitions or a single partition. Single-partition inputs will be broadcast to every partition of multi-partition inputs.

$META

See also

dd.DataFrame.map_overlap