- DataFrameGroupBy.aggregate(arg, split_every=None, split_out=1, shuffle=None)¶
Aggregate using one or more specified operations
Based on pd.core.groupby.DataFrameGroupBy.aggregate
- argcallable, str, list or dict
Aggregation spec. Accepted combinations are:
string function name
list of functions and/or function names, e.g.
dict of column names -> function, function name or list of such.
- split_everyint, optional
Number of intermediate partitions that may be aggregated at once. This defaults to 8. If your intermediate partitions are likely to be small (either due to a small number of groups or a small initial partition size), consider increasing this number for better performance.
- split_outint, optional
Number of output partitions. Default is 1.
- shufflebool or str, optional
Whether a shuffle-based algorithm should be used. A specific algorithm name may also be specified (e.g.
"p2p"). The shuffle-based algorithm is likely to be more efficient than
split_out>1and the number of unique groups is large (high cardinality). Default is
split_out = 1. When
split_out > 1, it chooses the algorithm set by the
shuffleoption in the dask config system, or
"tasks"if nothing is set.