dask_expr._collection.Index

dask_expr._collection.Index¶

class dask_expr._collection.Index(expr)[source]¶

Index-like Expr Collection

__init__(expr)¶

Methods

`__init__`(expr)
`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, level, fill_value, axis])
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`align`(other[, join, axis, fill_value])	Align two objects on their axes with the specified join method.
`all`([axis, skipna, split_every])	Return whether all elements are True, potentially over an axis.
`analyze`([filename, format])	Outputs statistics about every node in the expression.
`any`([axis, skipna, split_every])	Return whether any element is True, potentially over an axis.
`apply`(function, *args[, meta, axis])	Parallel version of pandas.Series.apply
`astype`(dtypes)	Cast a pandas object to a specified dtype `dtype`.
`autocorr`([lag, split_every])	Compute the lag-N autocorrelation.
`between`(left, right[, inclusive])	Return boolean Series equivalent to left <= series <= right.
`bfill`([axis, limit])	Fill NA/NaN values by using the next valid observation to fill the gap.
`case_when`(caselist)	Replace values where the conditions are True.
`clear_divisions`()	Forget division information.
`clip`([lower, upper, axis])	Trim values at input threshold(s).
`combine`(other, func[, fill_value])	Combine the Series with a Series or scalar according to func.
`combine_first`(other)	Update null elements with value in the same location in other.
`compute`([fuse])	Compute this DataFrame.
`compute_current_divisions`([col, set_divisions])	Compute the current divisions of the DataFrame.
`copy`([deep])	Make a copy of the dataframe
`corr`(other[, method, min_periods, split_every])	Compute correlation with other Series, excluding missing values.
`count`([split_every])	Count non-NA cells for each column or row.
`cov`(other[, min_periods, split_every])	Compute covariance with Series, excluding missing values.
`cummax`([axis, skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
`describe`([split_every, percentiles, ...])	Generate descriptive statistics.
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, level, fill_value, axis])
`divide`(other[, level, fill_value, axis])
`dot`(other[, meta])	Compute the dot product between the Series and the columns of other.
`drop_duplicates`([ignore_index, split_every, ...])
`dropna`()	Return a new Series with missing values removed.
`enforce_runtime_divisions`()	Enforce the current divisions at runtime.
`eq`(other[, level, fill_value, axis])
`explain`([stage, format])	Create a graph representation of the Expression.
`explode`()	Transform each element of a list-like to a row.
`ffill`([axis, limit])	Fill NA/NaN values by propagating the last valid observation to next valid.
`fillna`([value, axis])	Fill NA/NaN values using the specified method.
`floordiv`(other[, level, fill_value, axis])
`from_dict`(data, *[, npartitions, orient, ...])	Construct a Dask DataFrame from a Python Dictionary
`ge`(other[, level, fill_value, axis])
`get_partition`(n)	Get a dask DataFrame/Series representing the nth partition.
`groupby`(by, **kwargs)	Group Series using a mapper or by a Series of columns.
`gt`(other[, level, fill_value, axis])
`head`([n, npartitions, compute])	First n rows of the dataset
`idxmax`(args, *kwargs)	Return index of first occurrence of maximum over requested axis.
`idxmin`(args, *kwargs)	Return index of first occurrence of minimum over requested axis.
`isin`(values)	Whether each element in the DataFrame is contained in values.
`isna`()	Detect missing values.
`isnull`()	DataFrame.isnull is an alias for DataFrame.isna.
`kurt`([axis, fisher, bias, nan_policy, ...])	Return unbiased kurtosis over requested axis.
`kurtosis`([axis, fisher, bias, nan_policy, ...])	Return unbiased kurtosis over requested axis.
`le`(other[, level, fill_value, axis])
`lower_once`()
`lt`(other[, level, fill_value, axis])
`map`(arg[, na_action, meta, is_monotonic])	Map values using an input mapping or function.
`map_overlap`(func, before, after, *args[, ...])	Apply a function to each partition, sharing rows with adjacent partitions.
`map_partitions`(func, *args[, meta, ...])	Apply a Python function to each partition
`mask`(cond[, other])	Replace values where the condition is True.
`max`([axis, skipna, numeric_only, split_every])	Return the maximum of the values over the requested axis.
`mean`(args, *kwargs)	Return the mean of the values over the requested axis.
`median`()	Return the median of the values over the requested axis.
`median_approximate`([method])	Return the approximate median of the values over the requested axis.
`memory_usage`([deep])	Memory usage of the values.
`memory_usage_per_partition`([index, deep])	Return the memory usage of each partition
`min`([axis, skipna, numeric_only, split_every])	Return the minimum of the values over the requested axis.
`mod`(other[, level, fill_value, axis])
`mode`([dropna, split_every])	Return the mode(s) of the Series.
`mul`(other[, level, fill_value, axis])
`ne`(other[, level, fill_value, axis])
`nlargest`([n, split_every])	Return the largest n elements.
`notnull`()	DataFrame.notnull is an alias for DataFrame.notna.
`nsmallest`([n, split_every])	Return the smallest n elements.
`nunique`([dropna, split_every, split_out])	Return number of unique elements in the object.
`nunique_approx`([split_every])	Approximate number of unique rows.
`optimize`([fuse])	Optimizes the DataFrame.
`persist`([fuse])	Persist this dask collection into memory
`pipe`(func, args, *kwargs)	Apply chainable functions that expect Series or DataFrames.
`pow`(other[, level, fill_value, axis])
`pprint`()	Outputs a string representation of the DataFrame.
`prod`(args, *kwargs)	Return the product of the values over the requested axis.
`product`([axis, skipna, numeric_only, ...])	Return the product of the values over the requested axis.
`quantile`([q, method])	Approximate quantiles of Series
`radd`(other[, level, fill_value, axis])
`random_split`(frac[, random_state, shuffle])	Pseudorandomly split dataframe into different pieces row-wise
`rdiv`(other[, level, fill_value, axis])
`reduction`(chunk[, aggregate, combine, meta, ...])	Generic row-wise reductions.
`rename`(index[, sorted_index])	Alter Series index labels or name
`rename_axis`([mapper, index, columns, axis])	Set the name of the axis for the index or columns.
`repartition`([divisions, npartitions, ...])	Repartition a collection
`replace`([to_replace, value, regex])	Replace values given in to_replace with value.
`resample`(rule[, closed, label])	Resample time-series data.
`reset_index`([drop])	Reset the index to the default index.
`rfloordiv`(other[, level, fill_value, axis])
`rmod`(other[, level, fill_value, axis])
`rmul`(other[, level, fill_value, axis])
`rolling`(window, **kwargs)	Provides rolling transformations.
`round`([decimals])	Round a DataFrame to a variable number of decimal places.
`rpow`(other[, level, fill_value, axis])
`rsub`(other[, level, fill_value, axis])
`rtruediv`(other[, level, fill_value, axis])
`sample`([n, frac, replace, random_state])	Random sample of items
`sem`([axis, skipna, ddof, split_every, ...])	Return unbiased standard error of the mean over requested axis.
`shift`([periods, freq])	Shift index by desired number of periods with an optional time freq.
`shuffle`([on, ignore_index, npartitions, ...])	Rearrange DataFrame into new partitions
`simplify`()
`skew`([axis, bias, nan_policy, numeric_only])	Return unbiased skew over requested axis.
`squeeze`()	Squeeze 1 dimensional axis objects into scalars.
`std`(args, *kwargs)	Return sample standard deviation over requested axis.
`sub`(other[, level, fill_value, axis])
`sum`(args, *kwargs)	Return the sum of the values over the requested axis.
`tail`([n, compute])	Last n rows of the dataset
`to_backend`([backend])	Move to a new DataFrame backend
`to_bag`([index, format])	Create a Dask Bag from a Series
`to_csv`(filename, **kwargs)	See dd.to_csv docstring for more information
`to_dask_array`([lengths, meta, optimize])	Convert a dask DataFrame to a dask array.
`to_dask_dataframe`(args, *kwargs)	Convert to a legacy dask-dataframe collection
`to_delayed`([optimize_graph])	Convert into a list of `dask.delayed` objects, one per partition.
`to_frame`([index, name])	Create a DataFrame with a column containing the Index.
`to_hdf`(path_or_buf, key[, mode, append])	See dd.to_hdf docstring for more information
`to_json`(filename, args, *kwargs)	See dd.to_json docstring for more information
`to_legacy_dataframe`([optimize])	Convert to a legacy dask-dataframe collection
`to_orc`(path, args, *kwargs)	See dd.to_orc docstring for more information
`to_records`([index, lengths])
`to_series`([index, name])	Create a Series with both index and values equal to the index keys.
`to_sql`(name, uri[, schema, if_exists, ...])
`to_string`([max_rows])	Render a string representation of the Series.
`to_timestamp`([freq, how])	Cast to DatetimeIndex of timestamps, at beginning of period.
`truediv`(other[, level, fill_value, axis])
`unique`([split_every, split_out, shuffle_method])	Return Series of unique values in the object.
`value_counts`([sort, ascending, dropna, ...])	Return a Series containing counts of unique values.
`var`(args, *kwargs)	Return unbiased variance over requested axis.
`visualize`([tasks])	Visualize the expression or task graph
`where`(cond[, other])	Replace values where the condition is False.

Attributes

`axes`
`columns`
`dask`
`divisions`	Tuple of `npartitions + 1` values, in ascending order, marking the lower/upper bounds of each partition's index.
`dtype`
`dtypes`	Return data types
`expr`
`index`	Return dask Index instance
`is_monotonic_decreasing`	Return boolean if values in the object are monotonically decreasing.
`is_monotonic_increasing`	Return boolean if values in the object are monotonically increasing.
`known_divisions`	Whether the divisions are known.
`loc`	Purely label-location based indexer for selection by label.
`name`
`nbytes`	Number of bytes
`ndim`	Return dimensionality
`npartitions`	Return number of partitions
`partitions`	Slice dataframe by partitions
`shape`	Return a tuple representing the dimensionality of the DataFrame.
`size`	Size of the Series or DataFrame as a Delayed object.
`values`	Return a dask.array of the values of this dataframe

dask_expr._collection.Series.where

dask_expr._collection.Index.add