dask_expr._collection.DataFrame

dask_expr._collection.DataFrame¶

class dask_expr._collection.DataFrame(expr)[source]¶

DataFrame-like Expr Collection

__init__(expr)¶

Methods

`__init__`(expr)
`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, axis, level, fill_value])
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`align`(other[, join, axis, fill_value])	Align two objects on their axes with the specified join method.
`all`([axis, skipna, split_every])	Return whether all elements are True, potentially over an axis.
`analyze`([filename, format])	Outputs statistics about every node in the expression.
`any`([axis, skipna, split_every])	Return whether any element is True, potentially over an axis.
`apply`(function, *args[, meta, axis])	Parallel version of pandas.DataFrame.apply
`assign`(**pairs)	Assign new columns to a DataFrame.
`astype`(dtypes)	Cast a pandas object to a specified dtype `dtype`.
`bfill`([axis, limit])	Fill NA/NaN values by using the next valid observation to fill the gap.
`categorize`([columns, index, split_every])	Convert columns of the DataFrame to category dtype.
`clear_divisions`()	Forget division information.
`clip`([lower, upper, axis])	Trim values at input threshold(s).
`combine`(other, func[, fill_value, overwrite])	Perform column-wise combine with another DataFrame.
`combine_first`(other)	Update null elements with value in the same location in other.
`compute`([fuse])	Compute this DataFrame.
`compute_current_divisions`([col, set_divisions])	Compute the current divisions of the DataFrame.
`copy`([deep])	Make a copy of the dataframe
`corr`([method, min_periods, numeric_only, ...])	Compute pairwise correlation of columns, excluding NA/null values.
`count`([axis, numeric_only, split_every])	Count non-NA cells for each column or row.
`cov`([min_periods, numeric_only, split_every])	Compute pairwise covariance of columns, excluding NA/null values.
`cummax`([axis, skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
`describe`([split_every, percentiles, ...])	Generate descriptive statistics.
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, axis, level, fill_value])
`divide`(other[, axis, level, fill_value])
`dot`(other[, meta])	Compute the dot product between the Series and the columns of other.
`drop`([labels, axis, columns, errors])	Drop specified labels from rows or columns.
`drop_duplicates`([subset, split_every, ...])	Return DataFrame with duplicate rows removed.
`dropna`([how, subset, thresh])	Remove missing values.
`enforce_runtime_divisions`()	Enforce the current divisions at runtime.
`eq`(other[, level, axis])
`eval`(expr, **kwargs)	Evaluate a string describing operations on DataFrame columns.
`explain`([stage, format])	Create a graph representation of the Expression.
`explode`(column)	Transform each element of a list-like to a row, replicating index values.
`ffill`([axis, limit])	Fill NA/NaN values by propagating the last valid observation to next valid.
`fillna`([value, axis])	Fill NA/NaN values using the specified method.
`floordiv`(other[, axis, level, fill_value])
`from_dict`(data, *[, npartitions, orient, ...])	Construct a Dask DataFrame from a Python Dictionary
`ge`(other[, level, axis])
`get_partition`(n)	Get a dask DataFrame/Series representing the nth partition.
`groupby`(by[, group_keys, sort, observed, dropna])	Group DataFrame using a mapper or by a Series of columns.
`gt`(other[, level, axis])
`head`([n, npartitions, compute])	First n rows of the dataset
`idxmax`([axis, skipna, numeric_only, split_every])	Return index of first occurrence of maximum over requested axis.
`idxmin`([axis, skipna, numeric_only, split_every])	Return index of first occurrence of minimum over requested axis.
`info`([buf, verbose, memory_usage])	Concise summary of a Dask DataFrame
`isin`(values)	Whether each element in the DataFrame is contained in values.
`isna`()	Detect missing values.
`isnull`()	DataFrame.isnull is an alias for DataFrame.isna.
`items`()	Iterate over (column name, Series) pairs.
`iterrows`()	Iterate over DataFrame rows as (index, Series) pairs.
`itertuples`([index, name])	Iterate over DataFrame rows as namedtuples.
`join`(other[, on, how, lsuffix, rsuffix, ...])	Join columns of another DataFrame.
`kurt`([axis, fisher, bias, nan_policy, ...])	Return unbiased kurtosis over requested axis.
`kurtosis`([axis, fisher, bias, nan_policy, ...])	Return unbiased kurtosis over requested axis.
`le`(other[, level, axis])
`lower_once`()
`lt`(other[, level, axis])
`map`(func[, na_action, meta])
`map_overlap`(func, before, after, *args[, ...])	Apply a function to each partition, sharing rows with adjacent partitions.
`map_partitions`(func, *args[, meta, ...])	Apply a Python function to each partition
`mask`(cond[, other])	Replace values where the condition is True.
`max`([axis, skipna, numeric_only, split_every])	Return the maximum of the values over the requested axis.
`mean`([axis, skipna, numeric_only, split_every])	Return the mean of the values over the requested axis.
`median`([axis, numeric_only])	Return the median of the values over the requested axis.
`median_approximate`([axis, method, numeric_only])	Return the approximate median of the values over the requested axis.
`memory_usage`([deep, index])	Return the memory usage of each column in bytes.
`memory_usage_per_partition`([index, deep])	Return the memory usage of each partition
`merge`(right[, how, on, left_on, right_on, ...])	Merge the DataFrame with another DataFrame
`min`([axis, skipna, numeric_only, split_every])	Return the minimum of the values over the requested axis.
`mod`(other[, axis, level, fill_value])
`mode`([dropna, split_every, numeric_only])	Get the mode(s) of each element along the selected axis.
`mul`(other[, axis, level, fill_value])
`ne`(other[, level, axis])
`nlargest`([n, columns, split_every])	Return the first n rows ordered by columns in descending order.
`notnull`()	DataFrame.notnull is an alias for DataFrame.notna.
`nsmallest`([n, columns, split_every])	Return the first n rows ordered by columns in ascending order.
`nunique`([axis, dropna, split_every])	Count number of distinct elements in specified axis.
`nunique_approx`([split_every])	Approximate number of unique rows.
`optimize`([fuse])	Optimizes the DataFrame.
`persist`([fuse])	Persist this dask collection into memory
`pipe`(func, args, *kwargs)	Apply chainable functions that expect Series or DataFrames.
`pivot_table`(index, columns, values[, aggfunc])	Create a spreadsheet-style pivot table as a DataFrame.
`pop`(item)	Return item and drop from frame.
`pow`(other[, axis, level, fill_value])
`pprint`()	Outputs a string representation of the DataFrame.
`prod`([axis, skipna, numeric_only, ...])	Return the product of the values over the requested axis.
`product`([axis, skipna, numeric_only, ...])	Return the product of the values over the requested axis.
`quantile`([q, axis, numeric_only, method])	Approximate row-wise and precise column-wise quantiles of DataFrame
`query`(expr, **kwargs)	Filter dataframe with complex expression
`radd`(other[, axis, level, fill_value])
`random_split`(frac[, random_state, shuffle])	Pseudorandomly split dataframe into different pieces row-wise
`rdiv`(other[, axis, level, fill_value])
`reduction`(chunk[, aggregate, combine, meta, ...])	Generic row-wise reductions.
`rename`([index, columns])	Rename columns or index labels.
`rename_axis`([mapper, index, columns, axis])	Set the name of the axis for the index or columns.
`repartition`([divisions, npartitions, ...])	Repartition a collection
`replace`([to_replace, value, regex])	Replace values given in to_replace with value.
`resample`(rule[, closed, label])	Resample time-series data.
`reset_index`([drop])	Reset the index to the default index.
`rfloordiv`(other[, axis, level, fill_value])
`rmod`(other[, axis, level, fill_value])
`rmul`(other[, axis, level, fill_value])
`rolling`(window, **kwargs)	Provides rolling transformations.
`round`([decimals])	Round a DataFrame to a variable number of decimal places.
`rpow`(other[, axis, level, fill_value])
`rsub`(other[, axis, level, fill_value])
`rtruediv`(other[, axis, level, fill_value])
`sample`([n, frac, replace, random_state])	Random sample of items
`select_dtypes`([include, exclude])	Return a subset of the DataFrame's columns based on the column dtypes.
`sem`([axis, skipna, ddof, split_every, ...])	Return unbiased standard error of the mean over requested axis.
`set_index`(other[, drop, sorted, ...])	Set the DataFrame index (row labels) using an existing column.
`shift`([periods, freq, axis])	Shift index by desired number of periods with an optional time freq.
`shuffle`([on, ignore_index, npartitions, ...])	Rearrange DataFrame into new partitions
`simplify`()
`skew`([axis, bias, nan_policy, numeric_only])	Return unbiased skew over requested axis.
`sort_values`(by[, npartitions, ascending, ...])	Sort the dataset by a single column.
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`std`([axis, skipna, ddof, numeric_only, ...])	Return sample standard deviation over requested axis.
`sub`(other[, axis, level, fill_value])
`sum`([axis, skipna, numeric_only, min_count, ...])	Return the sum of the values over the requested axis.
`tail`([n, compute])	Last n rows of the dataset
`to_backend`([backend])	Move to a new DataFrame backend
`to_bag`([index, format])	Create a Dask Bag from a Series
`to_csv`(filename, **kwargs)	See dd.to_csv docstring for more information
`to_dask_array`([lengths, meta, optimize])	Convert a dask DataFrame to a dask array.
`to_dask_dataframe`(args, *kwargs)	Convert to a legacy dask-dataframe collection
`to_delayed`([optimize_graph])	Convert into a list of `dask.delayed` objects, one per partition.
`to_hdf`(path_or_buf, key[, mode, append])	See dd.to_hdf docstring for more information
`to_html`([max_rows])	Render a DataFrame as an HTML table.
`to_json`(filename, args, *kwargs)	See dd.to_json docstring for more information
`to_legacy_dataframe`([optimize])	Convert to a legacy dask-dataframe collection
`to_orc`(path, args, *kwargs)	See dd.to_orc docstring for more information
`to_parquet`(path, **kwargs)
`to_records`([index, lengths])
`to_sql`(name, uri[, schema, if_exists, ...])
`to_string`([max_rows])	Render a DataFrame to a console-friendly tabular output.
`to_timestamp`([freq, how])	Cast to DatetimeIndex of timestamps, at beginning of period.
`truediv`(other[, axis, level, fill_value])
`var`([axis, skipna, ddof, numeric_only, ...])	Return unbiased variance over requested axis.
`visualize`([tasks])	Visualize the expression or task graph
`where`(cond[, other])	Replace values where the condition is False.

Attributes

`axes`
`columns`
`dask`
`divisions`	Tuple of `npartitions + 1` values, in ascending order, marking the lower/upper bounds of each partition's index.
`dtypes`	Return data types
`empty`
`expr`
`iloc`	Purely integer-location based indexing for selection by position.
`index`	Return dask Index instance
`known_divisions`	Whether the divisions are known.
`loc`	Purely label-location based indexer for selection by label.
`nbytes`
`ndim`	Return dimensionality
`npartitions`	Return number of partitions
`partitions`	Slice dataframe by partitions
`shape`
`size`	Size of the Series or DataFrame as a Delayed object.
`values`	Return a dask.array of the values of this dataframe

Dask DataFrame API with Logical Query Planning

dask_expr._collection.DataFrame.abs