Department of Statistical Science
Center for High-Dimensional Statistics
Adjusting the Benjamini-Hochberg Method for Controlling the False Discovery Rate in Knockoff Assisted Variable Selection
The false discovery rate (FDR) is a powerful measure of errors in the variable selection problem arising in multiple regression. However, the method of Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Ser. B), which is one of the most popular multiple testing tools to control the FDR, is not applicable to variable selection as it requires certain positive dependence condition in terms of correlations that is not generally met by the explanatory variables. Barber and Candès (2015, Annals of Statistics) introduced a novel multiple testing framework based on knockoffs for variable selection in multiple linear regression where the sample size is more than twice the number of explanatory variables and put forward some distribution-free procedures for controlling the FDR under this framework. The research presented in this talk revisits the knockoff-based multiple testing setup of Barber and Candès (2015) and adjusts the Benjamini-Hochberg method based on the ordinary least squares estimates (OLS) of the regression coefficients to this setup, making it a valid p-value based false discovery rate (FDR) controlling method that does not rely on the dependence structure of the explanatory variables. Simulations and real data applications demonstrate that our proposed method and a data-adaptive version of it that incorporates an estimate of the proportion of truly unimportant explanatory variables are powerful alternatives to the FDR controlling methods proposed in Barber and Candès (2015).