Matteo Sesia (USC Marshall School of Business)
11 May 2023 @ 12:00 - 13:00
- Past event
Testing for outliers with conformal p-values
Abstract. This talk discusses the construction of provably valid frequentist p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We study a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid in finite samples but are mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Finally, we discuss how to further boost power by leveraging a separate data set of known outliers with an approach inspired by weighted hypothesis testing. The practical relevance of our results is demonstrated by numerical experiments on real and simulated data.