dask.dataframe.reshape.get_dummies

dask.dataframe.reshape.get_dummies¶

dask.dataframe.reshape.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=<class 'bool'>, **kwargs)[source]¶

Convert categorical variable into dummy/indicator variables.

Data must have category dtype to infer result’s columns.

Parameters

dataSeries, or DataFrame: For Series, the dtype must be categorical. For DataFrame, at least one column must be categorical.
prefixstring, list of strings, or dict of strings, default None: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, prefix can be a dictionary mapping column names to prefixes.
prefix_sepstring, default ‘_’: If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.
dummy_nabool, default False: Add a column to indicate NaNs, if False NaNs are ignored.
columnslist-like, default None: Column names in the DataFrame to be encoded. If columns is None then all the columns with category dtype will be converted.
sparsebool, default False: Whether the dummy columns should be sparse or not. Returns SparseDataFrame if data is a Series or if all columns are included. Otherwise returns a DataFrame with some SparseBlocks.

New in version 0.18.2.
drop_firstbool, default False: Whether to get k-1 dummies out of k categorical levels by removing the first level.
dtypedtype, default bool: Data type for new columns. Only a single dtype is allowed.

New in version 0.18.2.

Returns

dummiesDataFrame

See also

pandas.get_dummies

Examples

Dask’s version only works with Categorical data, as this is the only way to know the output shape without computing all the data.

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> s = dd.from_pandas(pd.Series(list('abca')), npartitions=2)
>>> dd.get_dummies(s)
Traceback (most recent call last):
    ...
NotImplementedError: `get_dummies` with non-categorical dtypes is not supported...

With categorical data:

>>> s = dd.from_pandas(pd.Series(list('abca'), dtype='category'), npartitions=2)
>>> dd.get_dummies(s)  
Dask DataFrame Structure:
                   a      b      c
npartitions=2
0              bool  bool  bool
2                ...    ...    ...
3                ...    ...    ...
Dask Name: get_dummies, 2 graph layers
>>> dd.get_dummies(s).compute()  
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False

dask.dataframe.to_json

dask.dataframe.reshape.pivot_table