dask.delayed interface consists of one function,
Wraps functions. Can be used as a decorator, or around function calls directly (i.e.
delayed(foo)(a, b, c)). Outputs from functions wrapped in
delayedare proxy objects of type
Delayedthat contain a graph of all operations done to get to this result.
Wraps objects. Used to create
Delayed objects can be thought of as representing a key in the dask task
Delayed supports most python operations, each of which creates
Delayed representing the result:
- Most operators (
-, and so on)
- Item access and slicing (
- Attribute access (
- Method calls (
Operations that aren’t supported include:
- Mutating operators (
a += 1)
- Mutating magics such as
a = 1,
a.foo = 1)
- Iteration. (
for i in a: ...)
- Use as a predicate (
if a: ...)
The last two points in particular mean that
Delayed objects cannot be used for
control flow, meaning that no
Delayed can appear in a loop or if statement.
In other words you can’t iterate over a
Delayed object, or use it as part of
a condition in an if statement, but
Delayed object can be used in a body of a loop
or if statement (i.e. the example above is fine, but if
data was a
object it wouldn’t be).
Even with this limitation, many workflows can easily be parallelized.
||Wraps a function or object to produce a
Wraps a function or object to produce a
Delayedobjects act as proxies for the object they wrap, but all operations on them are done lazily by building up a dask graph internally.
- obj : object
The function or object to wrap
- name : string or hashable, optional
The key to use in the underlying graph for the wrapped object. Defaults to hashing content. Note that this only affects the name of the object wrapped by this call to delayed, and not the output of delayed function calls - for that use
dask_key_name=as described below.
nameis used as the key in task graphs, you should ensure that it uniquely identifies
obj. If you’d like to provide a descriptive name that is still unique, combine the descriptive name with
array_like. See Task Graphs for more.
- pure : bool, optional
Indicates whether calling the resulting
Delayedobject is a pure operation. If True, arguments to the call are hashed to produce deterministic keys. If not provided, the default is to check the global
delayed_puresetting, and fallback to
- nout : int, optional
The number of outputs returned from calling the resulting
Delayedobject. If provided, the
Delayedoutput of the call can be iterated into
noutobjects, allowing for unpacking of results. By default iteration over
Delayedobjects will error. Note, that
obj, to return a tuple of length 1, and consequently for
objshould return an empty tuple.
- traverse : bool, optional
By default dask traverses builtin python collections looking for dask objects passed to
delayed. For large collections this can be expensive. If
objdoesn’t contain any dask objects, set
traverse=Falseto avoid doing this traversal.
Apply to functions to delay execution:
>>> def inc(x): ... return x + 1
>>> inc(10) 11
>>> x = delayed(inc, pure=True)(10) >>> type(x) == Delayed True >>> x.compute() 11
Can be used as a decorator:
>>> @delayed(pure=True) ... def add(a, b): ... return a + b >>> add(1, 2).compute() 3
delayedalso accepts an optional keyword
pure. If False, then subsequent calls will always produce a different
Delayed. This is useful for non-pure functions (such as
>>> from random import random >>> out1 = delayed(random, pure=False)() >>> out2 = delayed(random, pure=False)() >>> out1.key == out2.key False
If you know a function is pure (output only depends on the input, with no global state), then you can set
pure=True. This will attempt to apply a consistent name to the output, but will fallback on the same behavior of
pure=Falseif this fails.
>>> @delayed(pure=True) ... def add(a, b): ... return a + b >>> out1 = add(1, 2) >>> out2 = add(1, 2) >>> out1.key == out2.key True
Instead of setting
pureas a property of the callable, you can also set it contextually using the
delayed_puresetting. Note that this influences the call and not the creation of the callable:
>>> import dask >>> @delayed ... def mul(a, b): ... return a * b >>> with dask.config.set(delayed_pure=True): ... print(mul(1, 2).key == mul(1, 2).key) True >>> with dask.config.set(delayed_pure=False): ... print(mul(1, 2).key == mul(1, 2).key) False
The key name of the result of calling a delayed object is determined by hashing the arguments by default. To explicitly set the name, you can use the
dask_key_namekeyword when calling the function:
>>> add(1, 2) # doctest: +SKIP Delayed('add-3dce7c56edd1ac2614add714086e950f') >>> add(1, 2, dask_key_name='three') Delayed('three')
Note that objects with the same key name are assumed to have the same result. If you set the names explicitly you should make sure your key names are different for different results.
>>> add(1, 2, dask_key_name='three') # doctest: +SKIP >>> add(2, 1, dask_key_name='three') # doctest: +SKIP >>> add(2, 2, dask_key_name='four') # doctest: +SKIP
delayedcan also be applied to objects to make operations on them lazy:
>>> a = delayed([1, 2, 3]) >>> isinstance(a, Delayed) True >>> a.compute() [1, 2, 3]
The key name of a delayed object is hashed by default if
pure=Trueor is generated randomly if
pure=False(default). To explicitly set the name, you can use the
namekeyword. To ensure that the key is unique you should include the tokenized value as well, or otherwise ensure that it’s unique:
>>> from dask.base import tokenize >>> data = [1, 2, 3] >>> a = delayed(data, name='mylist-' + tokenize(data)) >>> a # doctest: +SKIP Delayed('mylist-55af65871cb378a4fa6de1660c3e8fb7')
Delayed results act as a proxy to the underlying object. Many operators are supported:
>>> (a + [1, 2]).compute() [1, 2, 3, 1, 2] >>> a.compute() 2
Method and attribute access also works:
>>> a.count(2).compute() 1
Note that if a method doesn’t exist, no error will be thrown until runtime:
>>> res = a.not_a_real_method() >>> res.compute() # doctest: +SKIP AttributeError("'list' object has no attribute 'not_a_real_method'")
“Magic” methods (e.g. operators and attribute access) are assumed to be pure, meaning that subsequent calls must return the same results. This behavior is not overrideable through the
delayedcall, but can be modified using other ways as described below.
To invoke an impure attribute or operator, you’d need to use it in a delayed function with
>>> class Incrementer(object): ... def __init__(self): ... self._n = 0 ... @property ... def n(self): ... self._n += 1 ... return self._n ... >>> x = delayed(Incrementer()) >>> x.n.key == x.n.key True >>> get_n = delayed(lambda x: x.n, pure=False) >>> get_n(x).key == get_n(x).key False
In contrast, methods are assumed to be impure by default, meaning that subsequent calls may return different results. To assume purity, set pure=True. This allows sharing of any intermediate values.
>>> a.count(2, pure=True).key == a.count(2, pure=True).key True
As with function calls, method calls also respect the global
delayed_puresetting and support the
>>> a.count(2, dask_key_name="count_2") Delayed('count_2') >>> with dask.config.set(delayed_pure=True): ... print(a.count(2).key == a.count(2).key) True