- dask.dataframe.to_csv(df, filename, single_file=False, encoding='utf-8', mode='wt', name_function=None, compression=None, compute=True, scheduler=None, storage_options=None, header_first_partition_only=None, compute_kwargs=None, **kwargs)¶
Store Dask DataFrame to CSV files
One filename per partition will be created. You can specify the filenames in a variety of ways.
Use a globstring:
The * will be replaced by the increasing sequence 0, 1, 2, …
Use a globstring and a
name_function=keyword argument. The name_function function should expect an integer and produce a string. Strings produced by name_function must preserve the order of their respective partition indices.
>>> from datetime import date, timedelta >>> def name(i): ... return str(date(2015, 1, 1) + i * timedelta(days=1))
>>> name(0) '2015-01-01' >>> name(15) '2015-01-16'
>>> df.to_csv('/path/to/data/export-*.csv', name_function=name)
/path/to/data/export-2015-01-01.csv /path/to/data/export-2015-01-02.csv ...
You can also provide an explicit list of paths:
>>> paths = ['/path/to/data/alice.csv', '/path/to/data/bob.csv', ...] >>> df.to_csv(paths)
Data to save
Path glob indicating the naming scheme for the output files
- single_filebool, default False
Whether to save everything into a single CSV file. Under the single file mode, each partition is appended at the end of the specified CSV file. Note that not all filesystems support the append mode and thus the single file mode, especially on cloud storage systems such as S3 or GCS. A warning will be issued when writing to a file that is not backed by a local filesystem.
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘ascii’ on Python 2 and ‘utf-8’ on Python 3.
Python write mode, default ‘w’
- name_functioncallable, default None
Function accepting an integer (partition index) and producing a string to replace the asterisk in the given filename globstring. Should preserve the lexicographic order of partitions. Not supported when single_file is True.
- compressionstring, optional
a string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename
If true, immediately executes. If False, returns a set of delayed objects, which can be computed at a later time.
Parameters passed on to the backend filesystem class.
- header_first_partition_onlyboolean, default None
If set to True, only write the header row in the first output file. By default, headers are written to all partitions under the multiple file mode (single_file is False) and written only once under the single file mode (single_file is True). It must not be False under the single file mode.
- compute_kwargsdict, optional
Options to be passed in to the compute method
- kwargsdict, optional
Additional parameters to pass to pd.DataFrame.to_csv()
- The names of the file written if they were computed right away
- If not, the delayed tasks associated to the writing of the files
If header_first_partition_only is set to False or name_function is specified when single_file is True.