dask.dataframe.Series.to_csv
dask.dataframe.Series.to_csv¶
- Series.to_csv(filename, **kwargs)¶
Store Dask DataFrame to CSV files
One filename per partition will be created. You can specify the filenames in a variety of ways.
Use a globstring:
>>> df.to_csv('/path/to/data/export-*.csv')
The * will be replaced by the increasing sequence 0, 1, 2, …
/path/to/data/export-0.csv /path/to/data/export-1.csv
Use a globstring and a
name_function=
keyword argument. The name_function function should expect an integer and produce a string. Strings produced by name_function must preserve the order of their respective partition indices.>>> from datetime import date, timedelta >>> def name(i): ... return str(date(2015, 1, 1) + i * timedelta(days=1))
>>> name(0) '2015-01-01' >>> name(15) '2015-01-16'
>>> df.to_csv('/path/to/data/export-*.csv', name_function=name)
/path/to/data/export-2015-01-01.csv /path/to/data/export-2015-01-02.csv ...
You can also provide an explicit list of paths:
>>> paths = ['/path/to/data/alice.csv', '/path/to/data/bob.csv', ...] >>> df.to_csv(paths)
You can also provide a directory name:
>>> df.to_csv('/path/to/data')
The files will be numbered 0, 1, 2, (and so on) suffixed with ‘.part’:
/path/to/data/0.part /path/to/data/1.part
- Parameters
- dfdask.DataFrame
Data to save
- filenamestring or list
Absolute or relative filepath(s). Prefix with a protocol like
s3://
to save to remote filesystems.- single_filebool, default False
Whether to save everything into a single CSV file. Under the single file mode, each partition is appended at the end of the specified CSV file.
- encodingstring, default ‘utf-8’
A string representing the encoding to use in the output file.
- modestr, default ‘w’
Python file mode. The default is ‘w’ (or ‘wt’), for writing a new file or overwriting an existing file in text mode. ‘a’ (or ‘at’) will append to an existing file in text mode or create a new file if it does not already exist. See
open()
.- name_functioncallable, default None
Function accepting an integer (partition index) and producing a string to replace the asterisk in the given filename globstring. Should preserve the lexicographic order of partitions. Not supported when
single_file
is True.- compressionstring, optional
A string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename.
- computebool, default True
If True, immediately executes. If False, returns a set of delayed objects, which can be computed at a later time.
- storage_optionsdict
Parameters passed on to the backend filesystem class.
- header_first_partition_onlybool, default None
If set to True, only write the header row in the first output file. By default, headers are written to all partitions under the multiple file mode (
single_file
is False) and written only once under the single file mode (single_file
is True). It must be True under the single file mode.- compute_kwargsdict, optional
Options to be passed in to the compute method
- kwargsdict, optional
Additional parameters to pass to
pandas.DataFrame.to_csv()
.
- Returns
- The names of the file written if they were computed right away.
- If not, the delayed tasks associated with writing the files.
- Raises
- ValueError
If
header_first_partition_only
is set to False orname_function
is specified whensingle_file
is True.
See also