dask.bag.Bag.persist

dask.bag.Bag.persist#

Bag.persist(**kwargs)#

Persist this dask collection into memory

This turns a lazy Dask collection into a Dask collection with the same metadata, but now with the results fully computed or actively computing in the background.

The action of function differs significantly depending on the active task scheduler. If the task scheduler supports asynchronous computing, such as is the case of the dask.distributed scheduler, then persist will return immediately and the return value’s task graph will contain Dask Future objects. However if the task scheduler only supports blocking computation then the call to persist will block and the return value’s task graph will contain concrete Python results.

This function is particularly useful when using distributed systems, because the results will be kept in distributed memory, rather than returned to the local process as with compute.

Parameters:

schedulerstring, optional: Which scheduler to use like “threads”, “synchronous” or “processes”. If not provided, the default is to check the global settings first, and then fall back to the collection defaults.
optimize_graphbool, optional: If True [default], the graph is optimized before computation. Otherwise the graph is run as is. This can be useful for debugging.
**kwargs: Extra keywords to forward to the scheduler function.

Returns:

New dask collections backed by in-memory data

dask.bag.Bag.persist

Contents

dask.bag.Bag.persist#