dask.array.stats.f_oneway
dask.array.stats.f_oneway¶
- dask.array.stats.f_oneway(*args)[source]¶
Perform one-way ANOVA.
This docstring was copied from scipy.stats.f_oneway.
Some inconsistencies with the Dask version may exist.
The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.
- Parameters
- sample1, sample2, …array_like
The sample measurements for each group. There must be at least two arguments. If the arrays are multidimensional, then all the dimensions of the array must be the same except for axis.
- axisint, optional (Not supported in Dask)
Axis of the input arrays along which the test is applied. Default is 0.
- Returns
- statisticfloat
The computed F statistic of the test.
- pvaluefloat
The associated p-value from the F distribution.
- Warns
- ~scipy.stats.ConstantInputWarning
Raised if all values within each of the input arrays are identical. In this case the F statistic is either infinite or isn’t defined, so
np.inf
ornp.nan
is returned.- ~scipy.stats.DegenerateDataWarning
Raised if the length of any input array is 0, or if all the input arrays have length 1.
np.nan
is returned for the F statistic and the p-value in these cases.
Notes
The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.
The samples are independent.
Each sample is from a normally distributed population.
The population standard deviations of the groups are all equal. This property is known as homoscedasticity.
If these assumptions are not true for a given set of data, it may still be possible to use the Kruskal-Wallis H-test (scipy.stats.kruskal) or the Alexander-Govern test (scipy.stats.alexandergovern) although with some loss of power.
The length of each group must be at least one, and there must be at least one group with length greater than one. If these conditions are not satisfied, a warning is generated and (
np.nan
,np.nan
) is returned.If all values in each group are identical, and there exist at least two groups with different values, the function generates a warning and returns (
np.inf
, 0).If all values in all groups are the same, function generates a warning and returns (
np.nan
,np.nan
).The algorithm is from Heiman [2], pp.394-7.
References
- 1
R. Lowry, “Concepts and Applications of Inferential Statistics”, Chapter 14, 2014, http://vassarstats.net/textbook/
- 2
G.W. Heiman, “Understanding research methods and statistics: An integrated introduction for psychology”, Houghton, Mifflin and Company, 2001.
- 3
G.H. McDonald, “Handbook of Biological Statistics”, One-way ANOVA. http://www.biostathandbook.com/onewayanova.html
Examples
>>> import numpy as np >>> from scipy.stats import f_oneway
Here are some data [3] on a shell measurement (the length of the anterior adductor muscle scar, standardized by dividing by length) in the mussel Mytilus trossulus from five locations: Tillamook, Oregon; Newport, Oregon; Petersburg, Alaska; Magadan, Russia; and Tvarminne, Finland, taken from a much larger data set used in McDonald et al. (1991).
>>> tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735, ... 0.0659, 0.0923, 0.0836] >>> newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835, ... 0.0725] >>> petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105] >>> magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, ... 0.0689] >>> tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045] >>> f_oneway(tillamook, newport, petersburg, magadan, tvarminne) F_onewayResult(statistic=7.121019471642447, pvalue=0.0002812242314534544)
f_oneway accepts multidimensional input arrays. When the inputs are multidimensional and axis is not given, the test is performed along the first axis of the input arrays. For the following data, the test is performed three times, once for each column.
>>> a = np.array([[9.87, 9.03, 6.81], ... [7.18, 8.35, 7.00], ... [8.39, 7.58, 7.68], ... [7.45, 6.33, 9.35], ... [6.41, 7.10, 9.33], ... [8.00, 8.24, 8.44]]) >>> b = np.array([[6.35, 7.30, 7.16], ... [6.65, 6.68, 7.63], ... [5.72, 7.73, 6.72], ... [7.01, 9.19, 7.41], ... [7.75, 7.87, 8.30], ... [6.90, 7.97, 6.97]]) >>> c = np.array([[3.31, 8.77, 1.01], ... [8.25, 3.24, 3.62], ... [6.32, 8.81, 5.19], ... [7.48, 8.83, 8.91], ... [8.59, 6.01, 6.07], ... [3.07, 9.72, 7.48]]) >>> F, p = f_oneway(a, b, c) >>> F array([1.75676344, 0.03701228, 3.76439349]) >>> p array([0.20630784, 0.96375203, 0.04733157])