dask.dataframe.DataFrame.apply
dask.dataframe.DataFrame.apply¶
- DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, args=(), meta=_NoDefault.no_default, result_type=None, **kwds)[source]¶
Parallel version of pandas.DataFrame.apply
This mimics the pandas version except for the following:
Only
axis=1
is supported (and must be specified explicitly).The user should provide output metadata via the meta keyword.
- Parameters
- funcfunction
Function to apply to each column/row
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
0 or ‘index’: apply function to each column (NOT SUPPORTED)
1 or ‘columns’: apply function to each row
- metapd.DataFrame, pd.Series, dict, iterable, tuple, optional
An empty
pd.DataFrame
orpd.Series
that matches the dtypes and column names of the output. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of aDataFrame
, adict
of{name: dtype}
or iterable of(name, dtype)
can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of(name, dtype)
can be used. If not provided, dask will try to infer the metadata. This may lead to unexpected results, so providingmeta
is recommended. For more information, seedask.dataframe.utils.make_meta
.- argstuple
Positional arguments to pass to function in addition to the array/series
- Additional keyword arguments will be passed as keywords to the function
- Returns
- appliedSeries or DataFrame
See also
dask.DataFrame.map_partitions
Examples
>>> import pandas as pd >>> import dask.dataframe as dd >>> df = pd.DataFrame({'x': [1, 2, 3, 4, 5], ... 'y': [1., 2., 3., 4., 5.]}) >>> ddf = dd.from_pandas(df, npartitions=2)
Apply a function to row-wise passing in extra arguments in
args
andkwargs
:>>> def myadd(row, a, b=1): ... return row.sum() + a + b >>> res = ddf.apply(myadd, axis=1, args=(2,), b=1.5)
By default, dask tries to infer the output metadata by running your provided function on some fake data. This works well in many cases, but can sometimes be expensive, or even fail. To avoid this, you can manually specify the output metadata with the
meta
keyword. This can be specified in many forms, for more information seedask.dataframe.utils.make_meta
.Here we specify the output is a Series with name
'x'
, and dtypefloat64
:>>> res = ddf.apply(myadd, axis=1, args=(2,), b=1.5, meta=('x', 'f8'))
In the case where the metadata doesn’t change, you can also pass in the object itself directly:
>>> res = ddf.apply(lambda row: row + 1, axis=1, meta=ddf)