dask.dataframe.from_delayed

dask.dataframe.from_delayed(dfs, meta=None, divisions=None, prefix='from-delayed', verify_meta=True)[source]

Create Dask DataFrame from many Dask Delayed objects

Parameters
dfslist of Delayed

An iterable of dask.delayed.Delayed objects, such as come from dask.delayed These comprise the individual partitions of the resulting dataframe.

metapd.DataFrame, pd.Series, dict, iterable, tuple, optional

An empty pd.DataFrame or pd.Series that matches the dtypes and column names of the output. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of a DataFrame, a dict of {name: dtype} or iterable of (name, dtype) can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of (name, dtype) can be used. If not provided, dask will try to infer the metadata. This may lead to unexpected results, so providing meta is recommended. For more information, see dask.dataframe.utils.make_meta.

divisionstuple, str, optional

Partition boundaries along the index. For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions For string ‘sorted’ will compute the delayed values to find index values. Assumes that the indexes are mutually sorted. If None, then won’t use index information

prefixstr, optional

Prefix to prepend to the keys.

verify_metabool, optional

If True check that the partitions have consistent metadata, defaults to True.