dask.dataframe.from_delayed#
- dask.dataframe.from_delayed(dfs: Delayed | distributed.Future | Collection[Delayed | distributed.Future], meta=None, divisions: tuple | None = None, prefix: str | None = None, verify_meta: bool = True)[source]#
Create Dask DataFrame from many Dask Delayed objects
Warning
from_delayedshould only be used if the objects that create the data are complex and cannot be easily represented as a single function in an embarrassingly parallel fashion.from_mapis recommended if the query can be expressed as a single function like:- def read_xml(path):
return pd.read_xml(path)
ddf = dd.from_map(read_xml, paths)
from_delayedmight be deprecated in the future.- Parameters:
- dfs
A
dask.delayed.Delayed, adistributed.Future, or an iterable of either of these objects, e.g. returned byclient.submit. These comprise the individual partitions of the resulting dataframe. If a single object is provided (not an iterable), then the resulting dataframe will have only one partition.- $META
- divisions
Partition boundaries along the index. For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions If None, then won’t use index information
- prefix
Prefix to prepend to the keys.
- verify_meta
If True check that the partitions have consistent metadata, defaults to True.