DataFrameGroupBy.corr(ddof=1, split_every=None, split_out=1)

Compute pairwise correlation of columns, excluding NA/null values.

This docstring was copied from pandas.core.frame.DataFrame.corr.

Some inconsistencies with the Dask version may exist.

Groupby correlation: corr(X, Y) = cov(X, Y) / (std_x * std_y)

method{‘pearson’, ‘kendall’, ‘spearman’} or callable (Not supported in Dask)

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional (Not supported in Dask)

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

numeric_onlybool, default True (Not supported in Dask)

Include only float, int or boolean data.

New in version 1.5.0.

Deprecated since version 1.5.0: The default value of numeric_only will be False in a future version of pandas.


Correlation matrix.

See also


Compute pairwise correlation with another DataFrame or Series.


Compute the correlation between two Series.


Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.


>>> def histogram_intersection(a, b):  
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],  
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)  
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
>>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],  
...                   columns=['dogs', 'cats'])
>>> df.corr(min_periods=3)  
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0