Two-sample Statistics and Distance Metrics Based on Anisotropic Kernels
Alex Cloninger
UCSD
Abstract:
This talk introduces a new kernel-based Maximum Mean Discrepancy (MMD)
statistic for measuring the distance between two distributions given
finitely-many multivariate samples. When the distributions are locally
low-dimensional, the proposed test can be made more powerful to
distinguish certain alternatives by incorporating local covariance
matrices and constructing an anisotropic kernel. The kernel matrix is
asymmetric; it computes the affinity between n data points and a set of
n_R reference points, where n_R can be drastically smaller than n.\302
While the proposed statistic can be viewed as a special class of
Reproducing Kernel Hilbert Space MMD, the consistency of the test is
proved, under mild assumptions of the kernel, as long as ||p-q|| ~
O(n^{-1/2+\delta}) for any \delta>0 based on a result of convergence in
distribution of the test statistic. Applications to flow cytometry and
diffusion MRI data sets are demonstrated, which motivate the proposed
approach to compare distributions.