DataFrameGroupBy.aggregate(arg, split_every=None, split_out=1, shuffle=None)[source]

Aggregate using one or more specified operations

Based on pd.core.groupby.DataFrameGroupBy.aggregate

argcallable, str, list or dict

Aggregation spec. Accepted combinations are:

  • callable function

  • string function name

  • list of functions and/or function names, e.g. [np.sum, 'mean']

  • dict of column names -> function, function name or list of such.

split_everyint, optional

Number of intermediate partitions that may be aggregated at once. This defaults to 8. If your intermediate partitions are likely to be small (either due to a small number of groups or a small initial partition size), consider increasing this number for better performance.

split_outint, optional

Number of output partitions. Default is 1.

shufflebool or str, optional

Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g. "tasks" or "p2p"). The shuffle-based algorithm is likely to be more efficient than shuffle=False when split_out>1 and the number of unique groups is large (high cardinality). Default is False when split_out = 1. When split_out > 1, it chooses the algorithm set by the shuffle option in the dask config system, or "tasks" if nothing is set.