dask.dataframe.to_csv

dask.dataframe.to_csv#

dask.dataframe.to_csv(df, filename, single_file=False, encoding='utf-8', mode='wt', name_function=None, compression=None, compute=True, scheduler=None, storage_options=None, header_first_partition_only=None, compute_kwargs=None, **kwargs)[source]#

Store Dask DataFrame to CSV files

One filename per partition will be created. You can specify the filenames in a variety of ways.

Use a globstring:

>>> df.to_csv('/path/to/data/export-*.csv')

The * will be replaced by the increasing sequence 0, 1, 2, …

/path/to/data/export-0.csv
/path/to/data/export-1.csv

Use a globstring and a name_function= keyword argument. The name_function function should expect an integer and produce a string. Strings produced by name_function must preserve the order of their respective partition indices.

>>> from datetime import date, timedelta
>>> def name(i):
...     return str(date(2015, 1, 1) + i * timedelta(days=1))

>>> name(0)
'2015-01-01'
>>> name(15)
'2015-01-16'

>>> df.to_csv('/path/to/data/export-*.csv', name_function=name)

/path/to/data/export-2015-01-01.csv
/path/to/data/export-2015-01-02.csv
...

You can also provide an explicit list of paths:

>>> paths = ['/path/to/data/alice.csv', '/path/to/data/bob.csv', ...]
>>> df.to_csv(paths)

You can also provide a directory name:

>>> df.to_csv('/path/to/data')

The files will be numbered 0, 1, 2, (and so on) suffixed with ‘.part’:

/path/to/data/0.part
/path/to/data/1.part

Parameters:

dfdask.DataFrame: Data to save
filenamestring or list: Absolute or relative filepath(s). Prefix with a protocol like s3:// to save to remote filesystems.
single_filebool, default False: Whether to save everything into a single CSV file. Under the single file mode, each partition is appended at the end of the specified CSV file.
encodingstring, default ‘utf-8’: A string representing the encoding to use in the output file.
modestr, default ‘w’: Python file mode. The default is ‘w’ (or ‘wt’), for writing a new file or overwriting an existing file in text mode. ‘a’ (or ‘at’) will append to an existing file in text mode or create a new file if it does not already exist. See open().
name_functioncallable, default None: Function accepting an integer (partition index) and producing a string to replace the asterisk in the given filename globstring. Should preserve the lexicographic order of partitions. Not supported when single_file is True.
compressionstring, optional: A string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename.
computebool, default True: If True, immediately executes. If False, returns a set of delayed objects, which can be computed at a later time.
storage_optionsdict: Parameters passed on to the backend filesystem class.
header_first_partition_onlybool, default None: If set to True, only write the header row in the first output file. By default, headers are written to all partitions under the multiple file mode (single_file is False) and written only once under the single file mode (single_file is True). It must be True under the single file mode.
compute_kwargsdict, optional: Options to be passed in to the compute method
kwargsdict, optional: Additional parameters to pass to pandas.DataFrame.to_csv().

Returns:

The names of the file written if they were computed right away.
If not, the delayed tasks associated with writing the files.

Raises:

ValueError: If header_first_partition_only is set to False or name_function is specified when single_file is True.

dask.dataframe.to_csv

Contents

dask.dataframe.to_csv#