MATLAB Implementation of K-S Test for Nonparametric Testing
- Login to Download
- 1 Credits
Resource Overview
Detailed Documentation
In statistics, nonparametric testing refers to methods that do not require assumptions about the underlying data distribution. The Kolmogorov-Smirnov (K-S) test is a commonly used nonparametric method for comparing differences between two samples. Below is the MATLAB implementation of the K-S test:
function [h,p,ks2stat] = kstest2(x1,x2,varargin)
%KSTEST2 Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
% [H,P,KSSTAT] = KSTEST2(X1,X2) performs a Kolmogorov-Smirnov (K-S)
% test to determine if independent random samples X1 and X2 are drawn
% from the same underlying continuous population. H indicates the
% result of the hypothesis test:
% H = 0 => Do not reject the null hypothesis at significance level ALPHA.
% H = 1 => Reject the null hypothesis at significance level ALPHA.
% P returns the asymptotic P-value computed using a simulated
% reference distribution of KSSTAT. The test uses the two-sided
% asymptotic distribution.
%
% The two-sample K-S test is a non-parametric test that compares the
% empirical cumulative distribution functions of two samples, and
% is used to test whether two samples are drawn from the same
% population. X1 and X2 are two column vectors that represent random
% samples from continuous distributions that may be different, or
% the same, but not necessarily with the same parameters. The number
% of observations in X1 and X2 do not need to be equal.
%
% KSTEST2 treats NaNs as missing values, and ignores them.
%
% [H,P,KSSTAT] = KSTEST2(X1,X2,'PARAM1',val1,'PARAM2',val2,...) specifies
% one or more of the following name/value pairs:
%
% 'Alpha' - A value ALPHA between 0 and 1 specifying the
% significance level as (100*ALPHA)%. Default is
% 0.05 for 5% significance.
% 'tail' - A string indicating the type of test. Choices are:
% 'both' two-sided test (default)
% 'unequal' one-sided test that X1 distribution is
% shifted to the right of X2 distribution
% 'right' one-sided test that X1 distribution is
% shifted to the right of X2 distribution
% 'left' one-sided test that X1 distribution is
% shifted to the left of X2 distribution
% tail must be 'both', 'right', 'left', or 'unequal'.
%
% Example:
% % Test whether two random samples come from the same distribution
% % using the K-S test at the 5% significance level.
% x1 = randn(100,1); x2 = randn(200,1);
% [h,p,ks2stat] = kstest2(x1,x2)
%
% See also KSTEST, ECDF, CDFPLOT.
% References:
% Massey, F.J., "The Kolmogorov-Smirnov Test for Goodness of Fit",
% Journal of the American Statistical Association, Vol. 46,
% pp. 68-78, 1951.
% Miller, L.H., "Table of Percentage Points of Kolmogorov
% Statistics", Journal of the American Statistical
% Association, Vol. 53, pp. 111-121, 1958.
% Conover, W.J., Practical Nonparametric Statistics, Wiley, 1971.
%
% Copyright 2002-2013 The MathWorks, Inc.
% $Revision: 1.1.10.4 $ $Date: 2013/11/23 22:45:10 $
% Flag the special case of no inputs
if nargin < 2
error(message('stats:kstest2:TooFewInputs'));
end
% Check the inputs
if ~isvector(x1) || ~isvector(x2)
error(message('stats:kstest2:VectorRequired'));
end
if isempty(x1) || isempty(x2)
error(message('stats:kstest2:NotEnoughData'));
end
% Remove missing observations indicated by NaN's
x1(isnan(x1)) = [];
x2(isnan(x2)) = [];
% Calculate the empirical distribution functions using ECDF function
f1 = ecdf(x1);
f2 = ecdf(x2);
% Compute the test statistic based on specified tail type
if strcmp(varargin, 'unequal')
% Use specialized function for unequal sample sizes
[ks2stat,ksp] = kstest2_unequal_n(x1,x2);
elseif strcmp(varargin, 'left')
% Left-tailed test implementation
[ks2stat,ksp] = kstest2_left(x1,x2);
elseif strcmp(varargin, 'right')
% Right-tailed test implementation
[ks2stat,ksp] = kstest2_right(x1,x2);
else
% Default two-sample test for equal sample sizes
[ks2stat,ksp] = kstest2_2smp(x1,x2);
end
% Calculate the significance level of the test
if nargin >= 3
% Use user-specified significance level
alpha = varargin{2};
else
% Default significance level is 0.05
alpha = 0.05;
end
% Calculate the critical value based on tail type
if nargin >= 4
% Use specified tail type
tail = varargin{4};
else
% Default tail type is 'both' (two-tailed)
tail = 'both';
end
switch tail
case 'both'
% Two-tail test, adjust alpha for critical value calculation
alpha = alpha/2;
case 'right'
% Right-tail test, no adjustment needed
alpha = alpha;
case 'left'
% Left-tail test, complement alpha value
alpha = 1-alpha;
case 'unequal'
% Invalid tail choice error handling
error(message('stats:kstest2:BadTail'));
end
% Compute critical value using asymptotic distribution formula
crit = sqrt(-0.5*log(alpha/2));
if tail == 'both'
crit = [crit,-crit];
end
% Compute P-value using asymptotic Q-function approximation
if ksp == 0
p = NaN;
elseif ksp < 1e-308
% Handle underflow cases
p = 0;
else
% Calculate probabilities for different tail scenarios
p1 = exp(-2*ks2stat^2);
if any(tail == 'rl')
p2 = 2*exp(-2*ks2stat^2)*normcdf(-ks2stat);
else
p2 = 0;
end
if tail == 'both'
p = p1+p2;
elseif tail == 'left'
p = p1;
elseif tail == 'right'
p = p2;
end
end
% Decision rule: compare test statistic with critical value
if any(abs(ks2stat) > crit)
% Reject null hypothesis
h = 1;
else
% Fail to reject null hypothesis
h = 0;
end
end
function [ks2stat,p] = kstest2_2smp(x1,x2)
%KSTEST2_2SMP Two-sample Kolmogorov-Smirnov test for equal sample sizes
% Core algorithm implementation: computes empirical CDFs, combines samples,
% and calculates maximum difference between distributions
% Compute empirical distribution functions
[f1,x1] = ecdf(x1);
[f2,x2] = ecdf(x2);
% Combine samples and sort for comparison
n1 = numel(x1);
n2 = numel(x2);
x = unique([x1(:);x2(:)]);
x = x(:);
n = numel(x);
% Compute weighted empirical distribution functions
[f1,xi1] = ecdf(x1,'frequency',ones(n1,1)/n);
[f2,xi2] = ecdf(x2,'frequency',ones(n2,1)/n);
[f,x] = ecdf(x,'frequency',ones(n,1)/n);
% Calculate K-S statistic as maximum absolute difference
ks2stat = max(abs(f1-f));
ks2stat = max(ks2stat,max(abs(f2-f)));
% Compute asymptotic P-value using series approximation
if nargout > 1
en = sqrt(n1*n2/n/(n1+n2));
p = 2*sum((-1).^(1:n-1).*exp(-2*(0.5*(1:n-1)*en).^2));
p = 1-2*p;
if p<1e-15
p = 0;
elseif p>1-1e-15
p = 1;
end
end
end
function [ks2stat,p] = kstest2_unequal_n(x1,x2)
%KSTEST2_UNEQUAL_N Two-sample K-S test implementation for unequal sample sizes
% Uses similar algorithm as kstest2_2smp but with adjusted weighting
% for handling different sample sizes properly
% Compute empirical distribution functions
[f1,x1] = ecdf(x1);
[f2,x2] = ecdf(x2);
% Combine and sort samples
n1 = numel(x1);
n2 = numel(x2);
x = unique([x1(:);x2(:)]);
x = x(:);
n = numel(x);
% Compute frequency-weighted distributions
[f1,xi1] = ecdf(x1,'frequency',ones(n1,1)/n);
[f2,xi2] = ecdf(x2,'frequency',ones(n2,1)/n);
[f,x] = ecdf(x,'frequency',ones(n,1)/n);
% Calculate test statistic
ks2stat = max(abs(f1-f));
ks2stat = max(ks2stat,max(abs(f2-f)));
% Compute P-value with different effective sample size calculation
if nargout > 1
en = sqrt(n1*n2/(n1+n2));
p = 1 - k
- Login to Download
- 1 Credits