dask_expr._collection.DataFrame.memory_usage

dask_expr._collection.DataFrame.memory_usage¶

DataFrame.memory_usage(deep=False, index=True)[source]¶

Return the memory usage of each column in bytes.

This docstring was copied from pandas.core.frame.DataFrame.memory_usage.

Some inconsistencies with the Dask version may exist.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Parameters

indexbool, default True: Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.
deepbool, default False: If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Returns

Series: A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

See also

numpy.ndarray.nbytes: Total bytes consumed by the elements of an ndarray.
Series.memory_usage: Bytes consumed by a Series.
Categorical: Memory-efficient array for string values with many repeated values.
DataFrame.info: Concise summary of a DataFrame.

Notes

See the Frequently Asked Questions for more details.

Examples

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']  
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))  
...              for t in dtypes])
>>> df = pd.DataFrame(data)  
>>> df.head()  
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True

>>> df.memory_usage()  
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

>>> df.memory_usage(index=False)  
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)  
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)  
5244

dask_expr._collection.DataFrame.median_approximate

dask_expr._collection.DataFrame.memory_usage_per_partition