Dask works with GPUs in a few ways.
Dask doesn’t need to know that these functions use GPUs. It just runs Python functions. Whether or not those Python functions use a GPU is orthogonal to Dask. It will work regardless.
As a worked example, you may want to view this talk:
High Level Collections¶
Dask can also help to scale out large array and dataframe computations by combining the Dask Array and DataFrame collections with a GPU-accelerated array or dataframe library.
Recall that Dask Array creates a large array out of many NumPy arrays and Dask DataFrame creates a large dataframe out of many Pandas dataframes. We can use these same systems with GPUs if we swap out the NumPy/Pandas components with GPU-accelerated versions of those same libraries, as long as the GPU accelerated version looks enough like NumPy/Pandas in order to interoperate with Dask.
Fortunately, libraries that mimic NumPy, Pandas, and Scikit-Learn on the GPU do exist.
If you have cuDF installed then you should be able to convert a Pandas-backed Dask DataFrame to a cuDF-backed Dask DataFrame as follows:
df = df.map_partitions(cudf.from_pandas) # convert pandas partitions into cudf partitions
However, cuDF does not support the entire Pandas interface, and so a variety of Dask DataFrame operations will not function properly. Check the cuDF API Reference for currently supported interface.
Dask’s integration with CuPy relies on features recently added to
NumPy and CuPy, particularly in version
Chainer’s CuPy library provides a GPU accelerated NumPy-like library that interoperates nicely with Dask Array.
If you have CuPy installed then you should be able to convert a NumPy-backed Dask Array into a CuPy backed Dask Array as follows:
x = x.map_blocks(cupy.asarray)
CuPy is fairly mature and adheres closely to the NumPy API. However, small differences do exist and these can cause Dask Array operations to function improperly. Check the CuPy Reference Manual for API compatibility.
There are a variety of GPU accelerated machine learning libraries that follow the Scikit-Learn Estimator API of fit, transform, and predict. These can generally be used within Dask-ML’s meta estimators, such as hyper parameter optimization.
Some of these include:
From the examples above we can see that the user experience of using Dask with GPU-backed libraries isn’t very different from using it with CPU-backed libraries. However, there are some changes you might consider making when setting up your cluster.
By default Dask allows as many tasks as you have CPU cores to run concurrently. However if your tasks primarily use a GPU then you probably want far fewer tasks running at once. There are a few ways to limit parallelism here:
Limit the number of threads explicitly on your workers using the
--nthreadskeyword in the CLI or the
ncores=keyword the Cluster constructor.
Use worker resources and tag certain tasks as GPU tasks so that the scheduler will limit them, while leaving the rest of your CPU cores for other work
Specifying GPUs per Machine¶
Some configurations may have many GPU devices per node. Dask is often used to balance and coordinate work between these devices.
In these situations it is common to start one Dask worker per device, and use
the CUDA environment variable
CUDA_VISIBLE_DEVICES to pin each worker to
prefer one device.
# If we have four GPUs on one machine
CUDA_VISIBLE_DEVICES=0 dask-worker ...
CUDA_VISIBLE_DEVICES=1 dask-worker ...
CUDA_VISIBLE_DEVICES=2 dask-worker ...
CUDA_VISIBLE_DEVICES=3 dask-worker ...
The Dask CUDA project contains some convenience CLI and Python utilities to automate this process.