# Best Practices¶

It is easy to get started with Dask arrays, but using them well does require some experience. This page contains suggestions for best practices, and includes solutions to common problems.

## Use NumPy¶

If your data fits comfortably in RAM and you are not performance bound, then using NumPy might be the right choice. Dask adds another layer of complexity which may get in the way.

If you are just looking for speedups rather than scalability then you may want to consider a project like Numba

## Select a good chunk size¶

A common performance problem among Dask Array users is that they have chosen a chunk size that is either too small (leading to lots of overhead) or poorly aligned with their data (leading to inefficient reading).

While optimal sizes and shapes are highly problem specific, it is rare to see chunk sizes below 100 MB in size. If you are dealing with float64 data then this is around (4000, 4000) in size for a 2D array or (100, 400, 400) for a 3D array.

You want to choose a chunk size that is large in order to reduce the number of chunks that Dask has to think about (which affects overhead) but also small enough so that many of them can fit in memory at once. Dask will often have as many chunks in memory as twice the number of active threads.

When reading data you should align your chunks with your storage format. Most array storage formats store data in chunks themselves. If your Dask array chunks aren’t multiples of these chunk shapes then you will have to read the same data repeatedly, which can be expensive. Note though that often storage formats choose chunk sizes that are much smaller than is ideal for Dask, closer to 1MB than 100MB. In these cases you should choose a Dask chunk size that aligns with the storage chunk size and that every Dask chunk dimension is a multiple of the storage chunk dimension.

So for example if we have an HDF file that has chunks of size (128, 64), we might choose a chunk shape of (1280, 6400).

>>> import h5py
>>> storage = h5py.File('myfile.hdf5')['x']
>>> storage.chunks
(128, 64)

>>> x = da.from_array(storage, chunks=(1280, 6400))


Note that if you provide chunks='auto' then Dask Array will look for a .chunks attribute and use that to provide a good chunking.

By default Dask will run as many concurrent tasks as you have logical cores. It assumes that each task will consume about one core. However, many array-computing libraries are themselves multi-threaded, which can cause contention and low performance. In particular the BLAS/LAPACK libraries that back most of NumPy’s linear algebra routines are often multi-threaded, and need to be told to use only one thread explicitly. You can do this with the following environment variables (using bash export command below, but this may vary depending on your operating system).

export OMP_NUM_THREADS=1


You need to run this before you start your Python process for it to take effect.

## Consider Xarray¶

The Xarray package wraps around Dask Array, and so offers the same scalability, but also adds convenience when dealing with complex datasets. In particular Xarray can help with the following:

1. Manage multiple arrays together as a consistent dataset
2. Read from a stack of HDF or NetCDF files at once
3. Switch between Dask Array and NumPy with a consistent API

Xarray is used in wide range of fields, including physics, astronomy, geoscience, microscopy, bioinformatics, engineering, finance, and deep learning. Xarray also has a thriving user community that is good at providing support.

 blockwise(func, out_ind, *args, **kwargs) Tensor operation: Generalized inner and outer products map_blocks(func, *args, **kwargs) Map a function across all blocks of a dask array. map_overlap(x, func, depth[, boundary, trim]) Map a function over blocks of the array with some overlap reduction(x, chunk, aggregate[, axis, …]) General version of reductions