DataFrameGroupBy.corr(ddof=1, split_every=None, split_out=1)

Compute pairwise correlation of columns, excluding NA/null values.

This docstring was copied from pandas.core.frame.DataFrame.corr.

Some inconsistencies with the Dask version may exist.

Groupby correlation: corr(X, Y) = cov(X, Y) / (std_x * std_y)

method{‘pearson’, ‘kendall’, ‘spearman’} or callable (Not supported in Dask)

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional (Not supported in Dask)

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.


Correlation matrix.

See also


Compute pairwise correlation with another DataFrame or Series.


Compute the correlation between two Series.


>>> def histogram_intersection(a, b):  
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],  
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)  
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0