A Semi-Supervised Kernel Two-Sample Test
In recent years, significant advancements in statistics and machine learning have led to the development of semi-supervised methodologies that leverage both labeled and unlabeled data. One prominent area of interest within this domain is statistical inference, such as the two-sample test.
In this paper, we extend existing kernel two-sample testing methods to a semi-supervised setting based on sample-splitting and studentization, utilizing both labeled and unlabeled data. We prove that the corresponding statistic has a standard Gaussian asymptotic distribution under the null hypothesis given certain conditions. Additionally, we prove that under stricter conditions, the statistic has a Gaussian limiting distribution with or without unlabeled data. Furthermore, we show the consistency of the suggested test against fixed alternatives and obtain an explicit expression for power when using a bilinear kernel. Numerical analysis and experimental results validate the effectiveness of our proposed method.
Korea Advanced Institute of Science of Technology (KAIST)
291, Daehak-ro, Yuseong-gu, Daejeon 34141