Configuration Reference

Dask

dask.temporary-directory   None

Temporary directory for local disk storage /tmp, /scratch, or /local. This directory is used during dask spill-to-disk operations. When the value is "null" (default), dask will create a directory from where dask was launched: `cwd/dask-worker-space`

dask.dataframe.shuffle-compression   None

Compression algorithm used for on disk-shuffling. Partd, the library used for compression supports ZLib, BZ2, SNAPPY, and BLOSC

dask.array.svg.size   120

The size of pixels used when displaying a dask array as an SVG image. This is used, for example, for nice rendering in a Jupyter notebook

dask.optimization.fuse.active   True

Turn task fusion on/off

dask.optimization.fuse.ave-width   1

Upper limit for width, where width = num_nodes / height, a good measure of parallelizability

dask.optimization.fuse.max-width   None

Don't fuse if total width is greater than this. Set to null to dynamically adjust to 1.5 + ave_width * log(ave_width + 1)

dask.optimization.fuse.max-height   inf

Don't fuse more than this many levels

dask.optimization.fuse.max-depth-new-edges   None

Don't fuse if new dependencies are added after this many levels. Set to null to dynamically adjust to ave_width * 1.5.

dask.optimization.fuse.subgraphs   None

Set to True to fuse multiple tasks into SubgraphCallable objects. Set to None to let the default optimizer of individual dask collections decide. If no collection-specific default exists, None defaults to False.

dask.optimization.fuse.rename-keys   True

Set to true to rename the fused keys with `default_fused_keys_renamer`. Renaming fused keys can keep the graph more understandable and comprehensible, but it comes at the cost of additional processing. If False, then the top-most key will be used. For advanced usage, a function to create the new name is also accepted.

Distributed

Client

distributed.client.heartbeat   5s

This value is the time between heartbeats The client sends a periodic heartbeat message to the scheduler. If it misses enough of these then the scheduler assumes that it has gone.

distributed.client.scheduler-info-interval   2s

Interval between scheduler-info updates

Comm

distributed.comm.retry.count   0

The number of times to retry a connection

distributed.comm.retry.delay.min   1s

The first non-zero delay between retry attempts

distributed.comm.retry.delay.max   20s

The maximum delay between retries

distributed.comm.compression   auto

The compression algorithm to use This could be one of lz4, snappy, zstd, or blosc

distributed.comm.offload   10MiB

The size of message after which we choose to offload serialization to another thread In some cases, you may also choose to disable this altogether with the value false This is useful if you want to include serialization in profiling data, or if you have data types that are particularly sensitive to deserialization

distributed.comm.default-scheme   tcp

The default protocol to use, like tcp or tls

distributed.comm.socket-backlog   2048

When shuffling data between workers, there can really be O(cluster size) connection requests on a single worker socket, make sure the backlog is large enough not to lose any.

distributed.comm.recent-messages-log-length   0

number of messages to keep for debugging

distributed.comm.zstd.level   3

Compression level, between 1 and 22.

distributed.comm.zstd.threads   0

Number of threads to use. 0 for single-threaded, -1 to infer from cpu count.

distributed.comm.timeouts.connect   10s

No Comment

distributed.comm.timeouts.tcp   30s

No Comment

distributed.comm.require-encryption   False

Whether to require encryption on non-local comms

distributed.comm.tls.ciphers   None

No Comment

distributed.comm.tls.ca-file   None

Path to a CA file, in pem format

distributed.comm.tls.scheduler.cert   None

Path to certificate file

distributed.comm.tls.scheduler.key   None

Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank

distributed.comm.tls.worker.key   None

Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank

distributed.comm.tls.worker.cert   None

Path to certificate file

distributed.comm.tls.client.key   None

Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank

distributed.comm.tls.client.cert   None

Path to certificate file

Dashboard

The form for the dashboard links This is used wherever we print out the link for the dashboard It is filled in with relevant information like the schema, host, and port number

distributed.dashboard.export-tool   False

No Comment

distributed.dashboard.graph-max-items   5000

maximum number of tasks to try to plot in "graph" view

Deploy

distributed.deploy.lost-worker-timeout   15s

Interval after which to hard-close a lost worker job Otherwise we wait for a while to see if a worker will reappear

distributed.deploy.cluster-repr-interval   500ms

Interval between calls to update cluster-repr for the widget

Scheduler

distributed.scheduler.allowed-failures   3

The number of retries before a task is considered bad When a worker dies when a task is running that task is rerun elsewhere. If many workers die while running this same task then we call the task bad, and raise a KilledWorker exception. This is the number of workers that are allowed to die before this task is marked as bad.

distributed.scheduler.bandwidth   100000000

The expected bandwidth between any pair of workers This is used when making scheduling decisions. The scheduler will use this value as a baseline, but also learn it over time.

distributed.scheduler.blocked-handlers   []

A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.

distributed.scheduler.default-data-size   1kiB

The default size of a piece of data if we don't know anything about it. This is used by the scheduler in some scheduling decisions

distributed.scheduler.events-cleanup-delay   1h

The amount of time to wait until workers or clients are removed from the event log after they have been removed from the scheduler

distributed.scheduler.idle-timeout   None

Shut down the scheduler after this duration if no activity has occured This can be helpful to reduce costs and stop zombie processes from roaming the earth.

distributed.scheduler.transition-log-length   100000

How long should we keep the transition log Every time a task transitions states (like "waiting", "processing", "memory", "released") we record that transition in a log. To make sure that we don't run out of memory we will clear out old entries after a certain length. This is that length.

distributed.scheduler.work-stealing   True

Whether or not to balance work between workers dynamically Some times one worker has more work than we expected. The scheduler will move these tasks around as necessary by default. Set this to false to disable this behavior

distributed.scheduler.work-stealing-interval   100ms

How frequently to balance worker loads

distributed.scheduler.worker-ttl   None

Time to live for workers. If we don't receive a heartbeat faster than this then we assume that the worker has died.

distributed.scheduler.pickle   True

Is the scheduler allowed to deserialize arbitrary bytestrings? The scheduler almost never deserializes user data. However there are some cases where the user can submit functions to run directly on the scheduler. This can be convenient for debugging, but also introduces some security risk. By setting this to false we ensure that the user is unable to run arbitrary code on the scheduler.

distributed.scheduler.preload   []

Run custom modules during the lifetime of the scheduler You can run custom modules when the scheduler starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information

distributed.scheduler.preload-argv   []

Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information

distributed.scheduler.unknown-task-duration   500ms

Default duration for all tasks with unknown durations Over time the scheduler learns a duration for tasks. However when it sees a new type of task for the first time it has to make a guess as to how long it will take. This value is that guess.

distributed.scheduler.default-task-durations.rechunk-split   1us

No Comment

distributed.scheduler.default-task-durations.shuffle-split   1us

No Comment

distributed.scheduler.validate   False

Whether or not to run consistency checks during execution. This is typically only used for debugging.

distributed.scheduler.dashboard.status.task-stream-length   1000

The maximum number of tasks to include in the task stream plot

distributed.scheduler.dashboard.tasks.task-stream-length   100000

The maximum number of tasks to include in the task stream plot

distributed.scheduler.dashboard.tls.ca-file   None

No Comment

distributed.scheduler.dashboard.tls.key   None

No Comment

distributed.scheduler.dashboard.tls.cert   None

No Comment

distributed.scheduler.dashboard.bokeh-application.allow_websocket_origin   ['*']

No Comment

distributed.scheduler.dashboard.bokeh-application.keep_alive_milliseconds   500

No Comment

distributed.scheduler.dashboard.bokeh-application.check_unused_sessions_milliseconds   500

No Comment

distributed.scheduler.locks.lease-validation-interval   10s

The time to wait until an acquired semaphore is released if the Client goes out of scope

distributed.scheduler.locks.lease-timeout   30s

The timeout after which a lease will be released if not refreshed

distributed.scheduler.http.routes   ['distributed.http.scheduler.prometheus', 'distributed.http.scheduler.info', 'distributed.http.scheduler.json', 'distributed.http.health', 'distributed.http.proxy', 'distributed.http.statics']

A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.

Worker

distributed.worker.blocked-handlers   []

A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.

distributed.worker.multiprocessing-method   spawn

How we create new workers, one of "spawn", "forkserver", or "fork" This is passed to the ``multiprocessing.get_context`` function.

distributed.worker.use-file-locking   True

Whether or not to use lock files when creating workers Workers create a local directory in which to place temporary files. When many workers are created on the same process at once these workers can conflict with each other by trying to create this directory all at the same time. To avoid this, Dask usually used a file-based lock. However, on some systems file-based locks don't work. This is particularly common on HPC NFS systems, where users may want to set this to false.

distributed.worker.connections.outgoing   50

No Comment

distributed.worker.connections.incoming   10

No Comment

distributed.worker.preload   []

Run custom modules during the lifetime of the worker You can run custom modules when the worker starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information

distributed.worker.preload-argv   []

Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information

distributed.worker.daemon   True

Whether or not to run our process as a daemon process

distributed.worker.validate   False

Whether or not to run consistency checks during execution. This is typically only used for debugging.

distributed.worker.lifetime.duration   None

The time after creation to close the worker, like "1 hour"

distributed.worker.lifetime.stagger   0 seconds

Random amount by which to stagger lifetimes If you create many workers at the same time, you may want to avoid having them kill themselves all at the same time. To avoid this you might want to set a stagger time, so that they close themselves with some random variation, like "5 minutes" That way some workers can die, new ones can be brought up, and data can be transferred over smoothly.

distributed.worker.lifetime.restart   False

Do we try to resurrect the worker after the lifetime deadline?

distributed.worker.profile.interval   10ms

The time between polling the worker threads, typically short like 10ms

distributed.worker.profile.cycle   1000ms

The time between bundling together this data and sending it to the scheduler This controls the granularity at which people can query the profile information on the time axis.

distributed.worker.profile.low-level   False

Whether or not to use the libunwind and stacktrace libraries to gather profiling information at the lower level (beneath Python) To get this to work you will need to install the experimental stacktrace library at conda install -c numba stacktrace See https://github.com/numba/stacktrace

distributed.worker.memory.target   0.6

Target fraction below which to try to keep memory

distributed.worker.memory.spill   0.7

When the process memory (as observed by the operating system) gets above this amount we spill data to disk.

distributed.worker.memory.pause   0.8

When the process memory (as observed by the operating system) gets above this amount we no longer start new tasks on this worker.

distributed.worker.memory.terminate   0.95

When the process memory reaches this level the nanny process will kill the worker (if a nanny is present)

distributed.worker.http.routes   ['distributed.http.worker.prometheus', 'distributed.http.health', 'distributed.http.statics']

A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.

Admin

distributed.admin.tick.interval   20ms

The time between ticks, default 20ms

distributed.admin.tick.limit   3s

The time allowed before triggering a warning

distributed.admin.max-error-length   10000

Maximum length of traceback as text Some Python tracebacks can be very very long (particularly in stack overflow errors) If the traceback is larger than this size (in bytes) then we truncate it.

distributed.admin.log-length   10000

Default length of logs to keep in memory The scheduler and workers keep the last 10000 or so log entries in memory.

distributed.admin.log-format   %(name)s - %(levelname)s - %(message)s

The log format to emit. See https://docs.python.org/3/library/logging.html#logrecord-attributes

distributed.admin.pdb-on-err   False

Enter Python Debugger on scheduling error

UCX

ucx.tcp   None

No Comment

No Comment

ucx.infiniband   None

No Comment

ucx.rdmacm   None

No Comment

ucx.cuda_copy   None

No Comment

ucx.net-devices   None

Define which Infiniband device to use

RMM

rmm.pool-size   None

The size of the memory pool in bytes