Cloud Deployments

To get started running Dask on common Cloud providers like Amazon, Google, or Microsoft, we currently recommend deploying Dask with Kubernetes and Helm.

All three major cloud vendors now provide managed Kubernetes services. This allows us to reliably provide the same experience across all clouds, and ensures that solutions for any one provider remain up-to-date.

Alternatively, if you are deploying on a cloud-hosted Hadoop cluster like Amazon EMR or Google Cloud DataProc, you will want to use Dask-Yarn. Documentation on deploying on Amazon EMR specifically can be found here, the process is similar for Google Cloud DataProc.

Data Access

You may want to install additional libraries in your Jupyter and worker images to access the object stores of each cloud:

  • s3fs for Amazon’s S3
  • gcsfs for Google’s GCS
  • adlfs for Microsoft’s ADL

Historical Libraries

Dask previously maintained libraries for deploying Dask on Amazon’s EC2. Due to sporadic interest, and churn both within the Dask library and EC2 itself, these were not well maintained. They have since been deprecated in favor of the Kubernetes and Helm solution.