Setup Prometheus monitoring

Prometheus is a widely popular tool for monitoring and alerting a wide variety of systems. Dask.distributed exposes scheduler and worker metrics in a prometheus text based format. Metrics are available at http://scheduler-address:8787/metrics when the prometheus_client package has been installed.

Available metrics are as following

Metric name

Description

Scheduler

Worker

python_gc_objects_collected_total

Objects collected during gc.

Yes

Yes

python_gc_objects_uncollectable_total

Uncollectable object found during GC.

Yes

Yes

python_gc_collections_total

Number of times this generation was collected.

Yes

Yes

python_info

Python platform information.

Yes

Yes

dask_scheduler_workers

Number of workers connected.

Yes

dask_scheduler_clients

Number of clients connected.

Yes

dask_scheduler_tasks

Number of tasks at scheduler.

Yes

dask_worker_tasks

Number of tasks at worker.

Yes

dask_worker_connections

Number of task connections to other workers.

Yes

dask_worker_threads

Number of worker threads.

Yes

dask_worker_latency_seconds

Latency of worker connection.

Yes

dask_worker_tick_duration_median_seconds

Median tick duration at worker.

Yes

dask_worker_task_duration_median_seconds

Median task runtime at worker.

Yes

dask_worker_transfer_bandwidth_median_bytes

Bandwidth for transfer at worker in Bytes.

Yes