Random Number Generation

Random Number Generation

Dask’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.

Since Dask version 2023.2.1, the Generator can be initialized with a number of different BitGenerator classes. It exposes many different probability distributions. The legacy RandomState random number routines are still available, but are considered frozen and will not be getting any updates.

Differences with NumPy

Dask follows the NumPy interface for random number generation with some differences:

  • Methods under dask.array.random take a chunks keyword.

  • Dask tries to be backend agnostic. In other words, you can mostly use CuPy and NumPy interchangeably as a backend for random number generation. Any library providing a similar interface should also work with some effort.

Notes

  • BitGenerators: Objects that generate random sequences. These are provided by a backend library such as NumPy or CuPy and are typically unsigned integer words filled with sequences of either 32 or 64 random bits.

  • Generators: Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval.

  • Dask does not guarantee that the same number generator is used across versions. This means that numbers generated by dask.array.random by a new version may not be the same as the previous one, even when the same seed and distribution are used. As better algorithms evolve, the bit stream may change.

  • Dask does not guarantee parity in the generated numbers with any third party library. In particular, numbers generated by Dask and NumPy or CuPy will differ even when given the same seed and BitGenerator. Dask tends to spawn SeedSequence children to produce independent random number streams in parallel.

  • Many of the RandomState methods are exported as functions in dask.array.random. This usage is discouraged, as it is implemented via a global RandomState instance which is not advised on two counts:

    1. It uses global state, which means results will change as the code changes.

    2. It uses a RandomState rather than the more modern Generator.

    For backward compatible legacy reasons, we cannot change this. Use dask.array.random.default_rng() to get a Generator and use its methods instead.

  • Generator.integers is now the canonical way to generate integer random numbers from a discrete uniform distribution. The endpoint keyword can be used to specify open or closed intervals. This replaces both randint and random_integers.

  • Generator.random is now the canonical way to generate floating-point random numbers, which replaces random_sample. The dask.array.random.random method still uses RandomState for backwards compatibility and should be avoided for new code. Please use Generator.random instead.

Quick Start

Call default_rng to get a new instance of a Generator, then call its methods to obtain samples from different distributions. By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState.

# Do this (new version)
import dask.array as da
rng = da.random.default_rng()
vals = rng.standard_normal(10)
more_vals = rng.standard_normal(10)

# instead of this (legacy version)
import dask.array as da
vals = da.random.standard_normal(10)
more_vals = da.random.standard_normal(10)

For further info, please see NumPy docs