API

Dataframe

DataFrame(dsk, name, meta, divisions)

Parallel Pandas DataFrame

DataFrame.abs()

Return a Series/DataFrame with absolute numeric value of each element.

DataFrame.add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

DataFrame.align(other[, join, axis, fill_value])

Align two objects on their axes with the specified join method.

DataFrame.all([axis, skipna, split_every, out])

Return whether all elements are True, potentially over an axis.

DataFrame.any([axis, skipna, split_every, out])

Return whether any element is True, potentially over an axis.

DataFrame.append(other[, interleave_partitions])

Append rows of other to the end of caller, returning a new object.

DataFrame.apply(func[, axis, broadcast, ...])

Parallel version of pandas.DataFrame.apply

DataFrame.applymap(func[, meta])

Apply a function to a Dataframe elementwise.

DataFrame.assign(**kwargs)

Assign new columns to a DataFrame.

DataFrame.astype(dtype)

Cast a pandas object to a specified dtype dtype.

DataFrame.bfill([axis, limit])

DataFrame.categorize([columns, index, ...])

Convert columns of the DataFrame to category dtype.

DataFrame.columns

DataFrame.compute(**kwargs)

Compute this dask collection

DataFrame.copy([deep])

Make a copy of the dataframe

DataFrame.corr([method, min_periods, ...])

Compute pairwise correlation of columns, excluding NA/null values.

DataFrame.count([axis, split_every, ...])

Count non-NA cells for each column or row.

DataFrame.cov([min_periods, split_every])

Compute pairwise covariance of columns, excluding NA/null values.

DataFrame.cummax([axis, skipna, out])

Return cumulative maximum over a DataFrame or Series axis.

DataFrame.cummin([axis, skipna, out])

Return cumulative minimum over a DataFrame or Series axis.

DataFrame.cumprod([axis, skipna, dtype, out])

Return cumulative product over a DataFrame or Series axis.

DataFrame.cumsum([axis, skipna, dtype, out])

Return cumulative sum over a DataFrame or Series axis.

DataFrame.describe([split_every, ...])

Generate descriptive statistics.

DataFrame.diff([periods, axis])

First discrete difference of element.

DataFrame.div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.divide(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.drop([labels, axis, columns, errors])

Drop specified labels from rows or columns.

DataFrame.drop_duplicates([subset, ...])

Return DataFrame with duplicate rows removed.

DataFrame.dropna([how, subset, thresh])

Remove missing values.

DataFrame.dtypes

Return data types

DataFrame.eq(other[, axis, level])

Get Equal to of dataframe and other, element-wise (binary operator eq).

DataFrame.eval(expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

DataFrame.explode(column)

Transform each element of a list-like to a row, replicating index values.

DataFrame.ffill([axis, limit])

DataFrame.fillna([value, method, limit, axis])

Fill NA/NaN values using the specified method.

DataFrame.first(offset)

Select initial periods of time series data based on a date offset.

DataFrame.floordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

DataFrame.ge(other[, axis, level])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

DataFrame.get_partition(n)

Get a dask DataFrame/Series representing the nth partition.

DataFrame.groupby([by, group_keys, sort, ...])

Group DataFrame using a mapper or by a Series of columns.

DataFrame.gt(other[, axis, level])

Get Greater than of dataframe and other, element-wise (binary operator gt).

DataFrame.head([n, npartitions, compute])

First n rows of the dataset

DataFrame.idxmax([axis, skipna, split_every])

Return index of first occurrence of maximum over requested axis.

DataFrame.idxmin([axis, skipna, split_every])

Return index of first occurrence of minimum over requested axis.

DataFrame.iloc

Purely integer-location based indexing for selection by position.

DataFrame.index

Return dask Index instance

DataFrame.info([buf, verbose, memory_usage])

Concise summary of a Dask DataFrame.

DataFrame.isin(values)

Whether each element in the DataFrame is contained in values.

DataFrame.isna()

Detect missing values.

DataFrame.isnull()

Detect missing values.

DataFrame.items()

Iterate over (column name, Series) pairs.

DataFrame.iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

DataFrame.itertuples([index, name])

Iterate over DataFrame rows as namedtuples.

DataFrame.join(other[, on, how, lsuffix, ...])

Join columns of another DataFrame.

DataFrame.known_divisions

Whether divisions are already known

DataFrame.last(offset)

Select final periods of time series data based on a date offset.

DataFrame.le(other[, axis, level])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

DataFrame.loc

Purely label-location based indexer for selection by label.

DataFrame.lt(other[, axis, level])

Get Less than of dataframe and other, element-wise (binary operator lt).

DataFrame.map_partitions(func, *args, **kwargs)

Apply Python function on each DataFrame partition.

DataFrame.mask(cond[, other])

DataFrame.max([axis, skipna, split_every, ...])

Return the maximum of the values over the requested axis.

DataFrame.mean([axis, skipna, split_every, ...])

Return the mean of the values over the requested axis.

DataFrame.melt([id_vars, value_vars, ...])

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

DataFrame.memory_usage([index, deep])

Return the memory usage of each column in bytes.

DataFrame.memory_usage_per_partition([...])

Return the memory usage of each partition

DataFrame.merge(right[, how, on, left_on, ...])

Merge the DataFrame with another DataFrame

DataFrame.min([axis, skipna, split_every, ...])

Return the minimum of the values over the requested axis.

DataFrame.mod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

DataFrame.mode([dropna, split_every])

Get the mode(s) of each element along the selected axis.

DataFrame.mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

DataFrame.ndim

Return dimensionality

DataFrame.ne(other[, axis, level])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

DataFrame.nlargest([n, columns, split_every])

Return the first n rows ordered by columns in descending order.

DataFrame.npartitions

Return number of partitions

DataFrame.nsmallest([n, columns, split_every])

Return the first n rows ordered by columns in ascending order.

DataFrame.partitions

Slice dataframe by partitions

DataFrame.pivot_table([index, columns, ...])

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.pop(item)

Return item and drop from frame.

DataFrame.pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

DataFrame.prod([axis, skipna, split_every, ...])

Return the product of the values over the requested axis.

DataFrame.quantile([q, axis, method])

Approximate row-wise and precise column-wise quantiles of DataFrame

DataFrame.query(expr, **kwargs)

Filter dataframe with complex expression

DataFrame.radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

DataFrame.random_split(frac[, random_state, ...])

Pseudorandomly split dataframe into different pieces row-wise

DataFrame.rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.rename([index, columns])

Alter axes labels.

DataFrame.repartition([divisions, ...])

Repartition dataframe along new divisions

DataFrame.replace([to_replace, value, regex])

Replace values given in to_replace with value.

DataFrame.resample(rule[, closed, label])

Resample time-series data.

DataFrame.reset_index([drop])

Reset the index to the default index.

DataFrame.rfloordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

DataFrame.rmod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

DataFrame.rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

DataFrame.round([decimals])

Round a DataFrame to a variable number of decimal places.

DataFrame.rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

DataFrame.rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

DataFrame.rtruediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.sample([n, frac, replace, ...])

Random sample of items

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame's columns based on the column dtypes.

DataFrame.sem([axis, skipna, ddof, ...])

Return unbiased standard error of the mean over requested axis.

DataFrame.set_index(other[, drop, sorted, ...])

Set the DataFrame index (row labels) using an existing column.

DataFrame.shape

Return a tuple representing the dimensionality of the DataFrame.

DataFrame.shuffle(on[, npartitions, ...])

Rearrange DataFrame into new partitions

DataFrame.size

Size of the Series or DataFrame as a Delayed object.

DataFrame.sort_values(by[, npartitions, ...])

Sort the dataset by a single column.

DataFrame.squeeze([axis])

Squeeze 1 dimensional axis objects into scalars.

DataFrame.std([axis, skipna, ddof, ...])

Return sample standard deviation over requested axis.

DataFrame.sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

DataFrame.sum([axis, skipna, split_every, ...])

Return the sum of the values over the requested axis.

DataFrame.tail([n, compute])

Last n rows of the dataset

DataFrame.to_bag([index, format])

Create Dask Bag from a Dask DataFrame

DataFrame.to_csv(filename, **kwargs)

Store Dask DataFrame to CSV files

DataFrame.to_dask_array([lengths, meta])

Convert a dask DataFrame to a dask array.

DataFrame.to_delayed([optimize_graph])

Convert into a list of dask.delayed objects, one per partition.

DataFrame.to_hdf(path_or_buf, key[, mode, ...])

Store Dask Dataframe to Hierarchical Data Format (HDF) files

DataFrame.to_html([max_rows])

Render a DataFrame as an HTML table.

DataFrame.to_json(filename, *args, **kwargs)

See dd.to_json docstring for more information

DataFrame.to_parquet(path, *args, **kwargs)

Store Dask.dataframe to Parquet files

DataFrame.to_records([index, lengths])

Create Dask Array from a Dask Dataframe

DataFrame.to_string([max_rows])

Render a DataFrame to a console-friendly tabular output.

DataFrame.to_sql(name, uri[, schema, ...])

See dd.to_sql docstring for more information

DataFrame.to_timestamp([freq, how, axis])

Cast to DatetimeIndex of timestamps, at beginning of period.

DataFrame.truediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.values

Return a dask.array of the values of this dataframe

DataFrame.var([axis, skipna, ddof, ...])

Return unbiased variance over requested axis.

DataFrame.visualize([filename, format, ...])

Render the computation of this object's task graph using graphviz.

DataFrame.where(cond[, other])

Series

Series(dsk, name, meta, divisions)

Parallel Pandas Series

Series.add(other[, level, fill_value, axis])

Return Addition of series and other, element-wise (binary operator add).

Series.align(other[, join, axis, fill_value])

Align two objects on their axes with the specified join method.

Series.all([axis, skipna, split_every, out])

Return whether all elements are True, potentially over an axis.

Series.any([axis, skipna, split_every, out])

Return whether any element is True, potentially over an axis.

Series.append(other[, interleave_partitions])

Concatenate two or more Series.

Series.apply(func[, convert_dtype, meta, args])

Parallel version of pandas.Series.apply

Series.astype(dtype)

Cast a pandas object to a specified dtype dtype.

Series.autocorr([lag, split_every])

Compute the lag-N autocorrelation.

Series.between(left, right[, inclusive])

Return boolean Series equivalent to left <= series <= right.

Series.bfill([axis, limit])

Series.cat

Series.clear_divisions()

Forget division information

Series.clip([lower, upper, out])

Series.clip_lower(threshold)

Series.clip_upper(threshold)

Series.compute(**kwargs)

Compute this dask collection

Series.copy([deep])

Make a copy of the dataframe

Series.corr(other[, method, min_periods, ...])

Compute correlation with other Series, excluding missing values.

Series.count([split_every])

Return number of non-NA/null observations in the Series.

Series.cov(other[, min_periods, split_every])

Compute covariance with Series, excluding missing values.

Series.cummax([axis, skipna, out])

Return cumulative maximum over a DataFrame or Series axis.

Series.cummin([axis, skipna, out])

Return cumulative minimum over a DataFrame or Series axis.

Series.cumprod([axis, skipna, dtype, out])

Return cumulative product over a DataFrame or Series axis.

Series.cumsum([axis, skipna, dtype, out])

Return cumulative sum over a DataFrame or Series axis.

Series.describe([split_every, percentiles, ...])

Generate descriptive statistics.

Series.diff([periods, axis])

First discrete difference of element.

Series.div(other[, level, fill_value, axis])

Return Floating division of series and other, element-wise (binary operator truediv).

Series.drop_duplicates([subset, ...])

Return DataFrame with duplicate rows removed.

Series.dropna()

Return a new Series with missing values removed.

Series.dt

Namespace of datetime methods

Series.dtype

Return data type

Series.eq(other[, level, fill_value, axis])

Return Equal to of series and other, element-wise (binary operator eq).

Series.explode()

Transform each element of a list-like to a row.

Series.ffill([axis, limit])

Series.fillna([value, method, limit, axis])

Fill NA/NaN values using the specified method.

Series.first(offset)

Select initial periods of time series data based on a date offset.

Series.floordiv(other[, level, fill_value, axis])

Return Integer division of series and other, element-wise (binary operator floordiv).

Series.ge(other[, level, fill_value, axis])

Return Greater than or equal to of series and other, element-wise (binary operator ge).

Series.get_partition(n)

Get a dask DataFrame/Series representing the nth partition.

Series.groupby([by, group_keys, sort, ...])

Group Series using a mapper or by a Series of columns.

Series.gt(other[, level, fill_value, axis])

Return Greater than of series and other, element-wise (binary operator gt).

Series.head([n, npartitions, compute])

First n rows of the dataset

Series.idxmax([axis, skipna, split_every])

Return index of first occurrence of maximum over requested axis.

Series.idxmin([axis, skipna, split_every])

Return index of first occurrence of minimum over requested axis.

Series.isin(values)

Whether elements in Series are contained in values.

Series.isna()

Detect missing values.

Series.isnull()

Detect missing values.

Series.iteritems()

Lazily iterate over (index, value) tuples.

Series.known_divisions

Whether divisions are already known

Series.last(offset)

Select final periods of time series data based on a date offset.

Series.le(other[, level, fill_value, axis])

Return Less than or equal to of series and other, element-wise (binary operator le).

Series.loc

Purely label-location based indexer for selection by label.

Series.lt(other[, level, fill_value, axis])

Return Less than of series and other, element-wise (binary operator lt).

Series.map(arg[, na_action, meta])

Map values of Series according to input correspondence.

Series.map_overlap(func, before, after, ...)

Apply a function to each partition, sharing rows with adjacent partitions.

Series.map_partitions(func, *args, **kwargs)

Apply Python function on each DataFrame partition.

Series.mask(cond[, other])

Series.max([axis, skipna, split_every, out, ...])

Return the maximum of the values over the requested axis.

Series.mean([axis, skipna, split_every, ...])

Return the mean of the values over the requested axis.

Series.memory_usage([index, deep])

Return the memory usage of the Series.

Series.memory_usage_per_partition([index, deep])

Return the memory usage of each partition

Series.min([axis, skipna, split_every, out, ...])

Return the minimum of the values over the requested axis.

Series.mod(other[, level, fill_value, axis])

Return Modulo of series and other, element-wise (binary operator mod).

Series.mul(other[, level, fill_value, axis])

Return Multiplication of series and other, element-wise (binary operator mul).

Series.nbytes

Number of bytes

Series.ndim

Return dimensionality

Series.ne(other[, level, fill_value, axis])

Return Not equal to of series and other, element-wise (binary operator ne).

Series.nlargest([n, split_every])

Return the largest n elements.

Series.notnull()

Detect existing (non-missing) values.

Series.nsmallest([n, split_every])

Return the smallest n elements.

Series.nunique([split_every])

Return number of unique elements in the object.

Series.nunique_approx([split_every])

Approximate number of unique rows.

Series.persist(**kwargs)

Persist this dask collection into memory

Series.pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Series.pow(other[, level, fill_value, axis])

Return Exponential power of series and other, element-wise (binary operator pow).

Series.prod([axis, skipna, split_every, ...])

Return the product of the values over the requested axis.

Series.quantile([q, method])

Approximate quantiles of Series

Series.radd(other[, level, fill_value, axis])

Return Addition of series and other, element-wise (binary operator radd).

Series.random_split(frac[, random_state, ...])

Pseudorandomly split dataframe into different pieces row-wise

Series.rdiv(other[, level, fill_value, axis])

Return Floating division of series and other, element-wise (binary operator rtruediv).

Series.reduction(chunk[, aggregate, ...])

Generic row-wise reductions.

Series.repartition([divisions, npartitions, ...])

Repartition dataframe along new divisions

Series.replace([to_replace, value, regex])

Replace values given in to_replace with value.

Series.rename([index, inplace, sorted_index])

Alter Series index labels or name

Series.resample(rule[, closed, label])

Resample time-series data.

Series.reset_index([drop])

Reset the index to the default index.

Series.rolling(window[, min_periods, ...])

Provides rolling transformations.

Series.round([decimals])

Round each value in a Series to the given number of decimals.

Series.sample([n, frac, replace, random_state])

Random sample of items

Series.sem([axis, skipna, ddof, ...])

Return unbiased standard error of the mean over requested axis.

Series.shape

Return a tuple representing the dimensionality of a Series.

Series.shift([periods, freq, axis])

Shift index by desired number of periods with an optional time freq.

Series.size

Size of the Series or DataFrame as a Delayed object.

Series.std([axis, skipna, ddof, ...])

Return sample standard deviation over requested axis.

Series.str

Namespace for string methods

Series.sub(other[, level, fill_value, axis])

Return Subtraction of series and other, element-wise (binary operator sub).

Series.sum([axis, skipna, split_every, ...])

Return the sum of the values over the requested axis.

Series.to_bag([index, format])

Create a Dask Bag from a Series

Series.to_csv(filename, **kwargs)

Store Dask DataFrame to CSV files

Series.to_dask_array([lengths, meta])

Convert a dask DataFrame to a dask array.

Series.to_delayed([optimize_graph])

Convert into a list of dask.delayed objects, one per partition.

Series.to_frame([name])

Convert Series to DataFrame.

Series.to_hdf(path_or_buf, key[, mode, append])

Store Dask Dataframe to Hierarchical Data Format (HDF) files

Series.to_string([max_rows])

Render a string representation of the Series.

Series.to_timestamp([freq, how, axis])

Cast to DatetimeIndex of timestamps, at beginning of period.

Series.truediv(other[, level, fill_value, axis])

Return Floating division of series and other, element-wise (binary operator truediv).

Series.unique([split_every, split_out])

Return Series of unique values in the object.

Series.value_counts([sort, ascending, ...])

Return a Series containing counts of unique values.

Series.values

Return a dask.array of the values of this dataframe

Series.var([axis, skipna, ddof, ...])

Return unbiased variance over requested axis.

Series.visualize([filename, format, ...])

Render the computation of this object's task graph using graphviz.

Series.where(cond[, other])

Groupby Operations

DataFrame Groupby

DataFrameGroupBy.aggregate(arg[, ...])

Aggregate using one or more operations over the specified axis.

DataFrameGroupBy.apply(func, *args, **kwargs)

Parallel version of pandas GroupBy.apply

DataFrameGroupBy.count([split_every, split_out])

Compute count of group, excluding missing values.

DataFrameGroupBy.cumcount([axis])

Number each item in each group from 0 to the length of that group - 1.

DataFrameGroupBy.cumprod([axis])

Cumulative product for each group.

DataFrameGroupBy.cumsum([axis])

Cumulative sum for each group.

DataFrameGroupBy.get_group(key)

Construct DataFrame from group with provided name.

DataFrameGroupBy.max([split_every, split_out])

Compute max of group values.

DataFrameGroupBy.mean([split_every, split_out])

Compute mean of groups, excluding missing values.

DataFrameGroupBy.min([split_every, split_out])

Compute min of group values.

DataFrameGroupBy.size([split_every, split_out])

Compute group sizes.

DataFrameGroupBy.std([ddof, split_every, ...])

Compute standard deviation of groups, excluding missing values.

DataFrameGroupBy.sum([split_every, ...])

Compute sum of group values.

DataFrameGroupBy.var([ddof, split_every, ...])

Compute variance of groups, excluding missing values.

DataFrameGroupBy.cov([ddof, split_every, ...])

Compute pairwise covariance of columns, excluding NA/null values.

DataFrameGroupBy.corr([ddof, split_every, ...])

Compute pairwise correlation of columns, excluding NA/null values.

DataFrameGroupBy.first([split_every, split_out])

Compute first of group values.

DataFrameGroupBy.last([split_every, split_out])

Compute last of group values.

DataFrameGroupBy.idxmin([split_every, ...])

Return index of first occurrence of minimum over requested axis.

DataFrameGroupBy.idxmax([split_every, ...])

Return index of first occurrence of maximum over requested axis.

DataFrameGroupBy.rolling(window[, ...])

Provides rolling transformations.

Series Groupby

SeriesGroupBy.aggregate(arg[, split_every, ...])

Aggregate using one or more operations over the specified axis.

SeriesGroupBy.apply(func, *args, **kwargs)

Parallel version of pandas GroupBy.apply

SeriesGroupBy.count([split_every, split_out])

Compute count of group, excluding missing values.

SeriesGroupBy.cumcount([axis])

Number each item in each group from 0 to the length of that group - 1.

SeriesGroupBy.cumprod([axis])

Cumulative product for each group.

SeriesGroupBy.cumsum([axis])

Cumulative sum for each group.

SeriesGroupBy.get_group(key)

Construct DataFrame from group with provided name.

SeriesGroupBy.max([split_every, split_out])

Compute max of group values.

SeriesGroupBy.mean([split_every, split_out])

Compute mean of groups, excluding missing values.

SeriesGroupBy.min([split_every, split_out])

Compute min of group values.

SeriesGroupBy.nunique([split_every, split_out])

Return number of unique elements in the group.

SeriesGroupBy.size([split_every, split_out])

Compute group sizes.

SeriesGroupBy.std([ddof, split_every, split_out])

Compute standard deviation of groups, excluding missing values.

SeriesGroupBy.sum([split_every, split_out, ...])

Compute sum of group values.

SeriesGroupBy.var([ddof, split_every, split_out])

Compute variance of groups, excluding missing values.

SeriesGroupBy.first([split_every, split_out])

Compute first of group values.

SeriesGroupBy.last([split_every, split_out])

Compute last of group values.

SeriesGroupBy.idxmin([split_every, ...])

Return index of first occurrence of minimum over requested axis.

SeriesGroupBy.idxmax([split_every, ...])

Return index of first occurrence of maximum over requested axis.

SeriesGroupBy.rolling(window[, min_periods, ...])

Provides rolling transformations.

Custom Aggregation

Aggregation(name, chunk, agg[, finalize])

User defined groupby-aggregation.

Rolling Operations

rolling.map_overlap(func, df, before, after, ...)

Apply a function to each partition, sharing rows with adjacent partitions.

Series.rolling(window[, min_periods, ...])

Provides rolling transformations.

DataFrame.rolling(window[, min_periods, ...])

Provides rolling transformations.

Rolling.apply(func[, raw, engine, ...])

Calculate the rolling custom aggregation function.

Rolling.count()

Calculate the rolling count of non NaN observations.

Rolling.kurt()

Calculate the rolling Fisher's definition of kurtosis without bias.

Rolling.max()

Calculate the rolling maximum.

Rolling.mean()

Calculate the rolling mean.

Rolling.median()

Calculate the rolling median.

Rolling.min()

Calculate the rolling minimum.

Rolling.quantile(quantile)

Calculate the rolling quantile.

Rolling.skew()

Calculate the rolling unbiased skewness.

Rolling.std([ddof])

Calculate the rolling standard deviation.

Rolling.sum()

Calculate the rolling sum.

Rolling.var([ddof])

Calculate the rolling variance.

Create DataFrames

read_csv(urlpath[, blocksize, ...])

Read CSV files into a Dask.DataFrame

read_table(urlpath[, blocksize, ...])

Read delimited files into a Dask.DataFrame

read_fwf(urlpath[, blocksize, ...])

Read fixed-width files into a Dask.DataFrame

read_parquet(path[, columns, filters, ...])

Read a Parquet file into a Dask DataFrame

read_hdf(pattern, key[, start, stop, ...])

Read HDF files into a Dask DataFrame

read_json(url_path[, orient, lines, ...])

Create a dataframe from a set of JSON files

read_orc(path[, engine, columns, index, ...])

Read dataframe from ORC file(s)

read_sql_table(table, uri, index_col[, ...])

Create dataframe from an SQL table.

from_array(x[, chunksize, columns, meta])

Read any sliceable array into a Dask Dataframe

from_bcolz(x[, chunksize, categorize, ...])

Read BColz CTable into a Dask Dataframe

from_dask_array(x[, columns, index, meta])

Create a Dask DataFrame from a Dask Array.

from_delayed(dfs[, meta, divisions, prefix, ...])

Create Dask DataFrame from many Dask Delayed objects

from_pandas(data[, npartitions, chunksize, ...])

Construct a Dask DataFrame from a Pandas DataFrame

Bag.to_dataframe([meta, columns])

Create Dask Dataframe from a Dask Bag.

Store DataFrames

to_csv(df, filename[, single_file, ...])

Store Dask DataFrame to CSV files

to_parquet(df, path[, engine, compression, ...])

Store Dask.dataframe to Parquet files

to_hdf(df, path, key[, mode, append, ...])

Store Dask Dataframe to Hierarchical Data Format (HDF) files

to_records(df)

Create Dask Array from a Dask Dataframe

to_sql(df, name, uri[, schema, if_exists, ...])

Store Dask Dataframe to a SQL table

to_json(df, url_path[, orient, lines, ...])

Write dataframe into JSON text files

Convert DataFrames

DataFrame.to_bag([index, format])

Create Dask Bag from a Dask DataFrame

DataFrame.to_dask_array([lengths, meta])

Convert a dask DataFrame to a dask array.

DataFrame.to_delayed([optimize_graph])

Convert into a list of dask.delayed objects, one per partition.

Reshape DataFrames

get_dummies(data[, prefix, prefix_sep, ...])

Convert categorical variable into dummy/indicator variables.

pivot_table(df[, index, columns, values, ...])

Create a spreadsheet-style pivot table as a DataFrame.

melt(frame[, id_vars, value_vars, var_name, ...])

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

Concatenate DataFrames

DataFrame.merge(right[, how, on, left_on, ...])

Merge the DataFrame with another DataFrame

concat(dfs[, axis, join, ...])

Concatenate DataFrames along rows.

merge(left, right[, how, on, left_on, ...])

Merge DataFrame or named Series objects with a database-style join.

merge_asof(left, right[, on, left_on, ...])

Perform an asof merge.

Resampling

Resampler(obj, rule, **kwargs)

Class for resampling timeseries data.

Resampler.agg(agg_funcs, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Resampler.count()

Compute count of group, excluding missing values.

Resampler.first()

Compute first of group values.

Resampler.last()

Compute last of group values.

Resampler.max()

Compute max of group values.

Resampler.mean()

Compute mean of groups, excluding missing values.

Resampler.median()

Compute median of groups, excluding missing values.

Resampler.min()

Compute min of group values.

Resampler.nunique()

Return number of unique elements in the group.

Resampler.ohlc()

Compute open, high, low and close values of a group, excluding missing values.

Resampler.prod()

Compute prod of group values.

Resampler.quantile()

Return value at the given quantile.

Resampler.sem()

Compute standard error of the mean of groups, excluding missing values.

Resampler.size()

Compute group sizes.

Resampler.std()

Compute standard deviation of groups, excluding missing values.

Resampler.sum()

Compute sum of group values.

Resampler.var()

Compute variance of groups, excluding missing values.

Dask Metadata

make_meta(x[, index, parent_meta])

This method creates meta-data based on the type of x, and parent_meta if supplied.

Other functions

compute(*args[, traverse, optimize_graph, ...])

Compute several dask collections at once.

map_partitions(func, *args[, meta, ...])

Apply Python function on each DataFrame partition.

to_datetime()

Convert argument to datetime.

to_numeric(arg[, errors, meta])

Convert argument to a numeric type.