GPUs

Dask works with GPUs in a few ways.

Custom Computations

Many people use Dask alongside GPU-accelerated libraries like PyTorch and TensorFlow to manage workloads across several machines. They typically use Dask’s custom APIs, notably Delayed and Futures.

Dask doesn’t need to know that these functions use GPUs. It just runs Python functions. Whether or not those Python functions use a GPU is orthogonal to Dask. It will work regardless.

As a worked example, you may want to view this talk:

High Level Collections

Dask can also help to scale out large array and dataframe computations by combining the Dask Array and DataFrame collections with a GPU-accelerated array or dataframe library.

Recall that Dask Array creates a large array out of many NumPy arrays and Dask DataFrame creates a large dataframe out of many Pandas dataframes. We can use these same systems with GPUs if we swap out the NumPy/Pandas components with GPU-accelerated versions of those same libraries, as long as the GPU accelerated version looks enough like NumPy/Pandas in order to interoperate with Dask.

Fortunately, libraries that mimic NumPy, Pandas, and Scikit-Learn on the GPU do exist.

DataFrames

The RAPIDS libraries provide a GPU accelerated Pandas-like library, cuDF, which interoperates well and is tested against Dask DataFrame.

If you have cuDF installed then you should be able to convert a Pandas-backed Dask DataFrame to a cuDF-backed Dask DataFrame as follows:

import cudf

df = df.map_partitions(cudf.from_pandas)  # convert pandas partitions into cudf partitions

However, cuDF does not support the entire Pandas interface, and so a variety of Dask DataFrame operations will not function properly. Check the cuDF API Reference for currently supported interface.

Arrays

Note

Dask’s integration with CuPy relies on features recently added to NumPy and CuPy, particularly in version numpy>=1.17 and cupy>=6

Chainer’s CuPy library provides a GPU accelerated NumPy-like library that interoperates nicely with Dask Array.

If you have CuPy installed then you should be able to convert a NumPy-backed Dask Array into a CuPy backed Dask Array as follows:

import cupy

x = x.map_blocks(cupy.asarray)

CuPy is fairly mature and adheres closely to the NumPy API. However, small differences do exist and these can cause Dask Array operations to function improperly. Check the CuPy Reference Manual for API compatibility.

Scikit-Learn

There are a variety of GPU accelerated machine learning libraries that follow the Scikit-Learn Estimator API of fit, transform, and predict. These can generally be used within Dask-ML’s meta estimators, such as hyper parameter optimization.

Some of these include:

Setup

From the examples above we can see that the user experience of using Dask with GPU-backed libraries isn’t very different from using it with CPU-backed libraries. However, there are some changes you might consider making when setting up your cluster.

Restricting Work

By default Dask allows as many tasks as you have CPU cores to run concurrently. However if your tasks primarily use a GPU then you probably want far fewer tasks running at once. There are a few ways to limit parallelism here:

  • Limit the number of threads explicitly on your workers using the --nthreads keyword in the CLI or the ncores= keyword the Cluster constructor.

  • Use worker resources and tag certain tasks as GPU tasks so that the scheduler will limit them, while leaving the rest of your CPU cores for other work

Specifying GPUs per Machine

Some configurations may have many GPU devices per node. Dask is often used to balance and coordinate work between these devices.

In these situations it is common to start one Dask worker per device, and use the CUDA environment variable CUDA_VISIBLE_DEVICES to pin each worker to prefer one device.

# If we have four GPUs on one machine
CUDA_VISIBLE_DEVICES=0 dask-worker ...
CUDA_VISIBLE_DEVICES=1 dask-worker ...
CUDA_VISIBLE_DEVICES=2 dask-worker ...
CUDA_VISIBLE_DEVICES=3 dask-worker ...

The Dask CUDA project contains some convenience CLI and Python utilities to automate this process.

Work in Progress

GPU computing is a quickly moving field today and as a result the information in this page is likely to go out of date quickly. We encourage interested readers to check out Dask’s Blog which has more timely updates on ongoing work.