- DataFrameGroupBy.aggregate(arg=None, split_every=None, split_out=1, shuffle=None, **kwargs)¶
Aggregate using one or more specified operations
Based on pd.core.groupby.DataFrameGroupBy.aggregate
- argcallable, str, list or dict, optional
Aggregation spec. Accepted combinations are:
string function name
list of functions and/or function names, e.g.
dict of column names -> function, function name or list of such.
None only if named aggregation syntax is used
- split_everyint, optional
Number of intermediate partitions that may be aggregated at once. This defaults to 8. If your intermediate partitions are likely to be small (either due to a small number of groups or a small initial partition size), consider increasing this number for better performance.
- split_outint, optional
Number of output partitions. Default is 1.
- shufflebool or str, optional
Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g.
"p2p"). The shuffle-based algorithm is likely to be more efficient than
split_out>1and the number of unique groups is large (high cardinality). Default is
split_out = 1. When
split_out > 1, it chooses the algorithm set by the
shuffleoption in the dask config system, or
"tasks"if nothing is set.
- kwargs: tuple or pd.NamedAgg, optional
Used for named aggregations where the keywords are the output column names and the values are tuples where the first element is the input column name and the second element is the aggregation function.
pandas.NamedAggcan also be used as the value. To use the named aggregation syntax, arg must be set to None.