- class dask.dataframe.groupby.Aggregation(name, chunk, agg, finalize=None)
User defined groupby-aggregation.
This class allows users to define their own custom aggregation in terms of operations on Pandas dataframes in a map-reduce style. You need to specify what operation to do on each chunk of data, how to combine those chunks of data together, and then how to finalize the result.
See Aggregate for more.
the name of the aggregation. It should be unique, since intermediate result will be identified by this name.
a function that will be called with the grouped column of each partition. It can either return a single series or a tuple of series. The index has to be equal to the groups.
a function that will be called to aggregate the results of each chunk. Again the argument(s) will be grouped series. If
chunkreturned a tuple,
aggwill be called with all of them as individual positional arguments.
an optional finalizer that will be called with the results from the aggregation.
We could implement
>>> custom_sum = dd.Aggregation( ... name='custom_sum', ... chunk=lambda s: s.sum(), ... agg=lambda s0: s0.sum() ... ) >>> df.groupby('g').agg(custom_sum)
We can implement
>>> custom_mean = dd.Aggregation( ... name='custom_mean', ... chunk=lambda s: (s.count(), s.sum()), ... agg=lambda count, sum: (count.sum(), sum.sum()), ... finalize=lambda count, sum: sum / count, ... ) >>> df.groupby('g').agg(custom_mean)
Though of course, both of these are built-in and so you don’t need to implement them yourself.
- __init__(name, chunk, agg, finalize=None)
__init__(name, chunk, agg[, finalize])