dask.bag.Bag.to_dataframe
dask.bag.Bag.to_dataframe¶
- Bag.to_dataframe(meta=None, columns=None, optimize_graph=True)[source]¶
Create Dask Dataframe from a Dask Bag.
Bag should contain tuples, dict records, or scalars.
Index will not be particularly meaningful. Use
reindex
afterwards if necessary.- Parameters
- metapd.DataFrame, dict, iterable, optional
An empty
pd.DataFrame
that matches the dtypes and column names of the output. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of aDataFrame
, adict
of{name: dtype}
or iterable of(name, dtype)
can be provided. If not provided or a list, a single element from the first partition will be computed, triggering a potentially expensive call tocompute
. This may lead to unexpected results, so providingmeta
is recommended. For more information, seedask.dataframe.utils.make_meta
.- columnssequence, optional
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns). Note that if
meta
is provided, column names will be taken from there and this parameter is invalid.- optimize_graphbool, optional
If True [default], the graph is optimized before converting into
dask.dataframe.DataFrame
.
Examples
>>> import dask.bag as db >>> b = db.from_sequence([{'name': 'Alice', 'balance': 100}, ... {'name': 'Bob', 'balance': 200}, ... {'name': 'Charlie', 'balance': 300}], ... npartitions=2) >>> df = b.to_dataframe()
>>> df.compute() name balance 0 Alice 100 1 Bob 200 0 Charlie 300