dask.dataframe.read_sql_table

dask.dataframe.read_sql_table(table, uri, index_col, divisions=None, npartitions=None, limits=None, columns=None, bytes_per_chunk='256 MiB', head_rows=5, schema=None, meta=None, engine_kwargs=None, **kwargs)[source]

Create dataframe from an SQL table.

If neither divisions or npartitions is given, the memory footprint of the first few rows will be determined, and partitions of size ~256MB will be used.

Parameters
tablestring or sqlalchemy expression

Select columns from here.

uristring

Full sqlalchemy URI for the database connection

index_colstring

Column which becomes the index, and defines the partitioning. Should be a indexed column in the SQL server, and any orderable type. If the type is number or time, then partition boundaries can be inferred from npartitions or bytes_per_chunk; otherwide must supply explicit divisions=. index_col could be a function to return a value, e.g., sql.func.abs(sql.column('value')).label('abs(value)'). index_col=sql.func.abs(sql.column("value")).label("abs(value)"), or index_col=cast(sql.column("id"),types.BigInteger).label("id") to convert the textfield id to BigInteger.

Note sql, cast, types methods comes from sqlalchemy module.

Labeling columns created by functions or arithmetic operations is required.

divisions: sequence

Values of the index column to split the table by. If given, this will override npartitions and bytes_per_chunk. The divisions are the value boundaries of the index column used to define the partitions. For example, divisions=list('acegikmoqsuwz') could be used to partition a string column lexographically into 12 partitions, with the implicit assumption that each partition contains similar numbers of records.

npartitionsint

Number of partitions, if divisions is not given. Will split the values of the index column linearly between limits, if given, or the column max/min. The index column must be numeric or time for this to work

limits: 2-tuple or None

Manually give upper and lower range of values for use with npartitions; if None, first fetches max/min from the DB. Upper limit, if given, is inclusive.

columnslist of strings or None

Which columns to select; if None, gets all; can include sqlalchemy functions, e.g., sql.func.abs(sql.column('value')).label('abs(value)'). Labeling columns created by functions or arithmetic operations is recommended.

bytes_per_chunkstr, int

If both divisions and npartitions is None, this is the target size of each partition, in bytes

head_rowsint

How many rows to load for inferring the data-types, unless passing meta

metaempty DataFrame or None

If provided, do not attempt to infer dtypes, but use these, coercing all chunks on load

schemastr or None

If using a table name, pass this to sqlalchemy to select which DB schema to use within the URI connection

engine_kwargsdict or None

Specific db engine parameters for sqlalchemy

kwargsdict

Additional parameters to pass to pd.read_sql()

Returns
dask.dataframe

Examples

>>> df = dd.read_sql_table('accounts', 'sqlite:///path/to/bank.db',
...                  npartitions=10, index_col='id')