dask.dataframe.read_sql_table

dask.dataframe.read_sql_table¶

dask.dataframe.read_sql_table(table_name, con, index_col, divisions=None, npartitions=None, limits=None, columns=None, bytes_per_chunk='256 MiB', head_rows=5, schema=None, meta=None, engine_kwargs=None, **kwargs)[source]¶

Read SQL database table into a DataFrame.

If neither divisions or npartitions is given, the memory footprint of the first few rows will be determined, and partitions of size ~256MB will be used.

Parameters

table_namestr: Name of SQL table in database.
constr: Full sqlalchemy URI for the database connection
index_colstr: Column which becomes the index, and defines the partitioning. Should be a indexed column in the SQL server, and any orderable type. If the type is number or time, then partition boundaries can be inferred from npartitions or bytes_per_chunk; otherwise must supply explicit divisions.
columnssequence of str or SqlAlchemy column or None: Which columns to select; if None, gets all. Note can be a mix of str and SqlAlchemy columns
schemastr or None: Pass this to sqlalchemy to select which DB schema to use within the URI connection
divisions: sequence: Values of the index column to split the table by. If given, this will override npartitions and bytes_per_chunk. The divisions are the value boundaries of the index column used to define the partitions. For example, divisions=list('acegikmoqsuwz') could be used to partition a string column lexographically into 12 partitions, with the implicit assumption that each partition contains similar numbers of records.
npartitionsint: Number of partitions, if divisions is not given. Will split the values of the index column linearly between limits, if given, or the column max/min. The index column must be numeric or time for this to work
limits: 2-tuple or None: Manually give upper and lower range of values for use with npartitions; if None, first fetches max/min from the DB. Upper limit, if given, is inclusive.
bytes_per_chunkstr or int: If both divisions and npartitions is None, this is the target size of each partition, in bytes
head_rowsint: How many rows to load for inferring the data-types, and memory per row
metaempty DataFrame or None: If provided, do not attempt to infer dtypes, but use these, coercing all chunks on load
engine_kwargsdict or None: Specific db engine parameters for sqlalchemy
kwargsdict: Additional parameters to pass to pd.read_sql()

Returns

dask.dataframe