dask.bag.read_avro

dask.bag.read_avro

dask.bag.read_avro(urlpath, blocksize=100000000, storage_options=None, compression=None)[source]

Read set of avro files

Use this with arbitrary nested avro schemas. Please refer to the fastavro documentation for its capabilities: https://github.com/fastavro/fastavro

Parameters
urlpath: string or list

Absolute or relative filepath, URL (may include protocols like s3://), or globstring pointing to data.

blocksize: int or None

Size of chunks in bytes. If None, there will be no chunking and each file will become one partition.

storage_options: dict or None

passed to backend file-system

compression: str or None

Compression format of the targe(s), like ‘gzip’. Should only be used with blocksize=None.