Inference on Treatment Effects after Model Selection
Inferring cause-effect relationships between variables is of primary importance in many sciences. In this talk, I will discuss two approaches for making valid inference on treatment effects when a large number of covariates are present. The first approach is to perform model selection and then to deliver inference based on the selected model. If the inference is made ignoring the randomness of the model selection process, then there could be severe biases in estimating the parameters of interest. While the estimation bias in an under-fitted model is well understood, I will address a lesser known bias that arises from an over-fitted model. The over-fitting bias can be eliminated through data splitting at the cost of statistical efficiency, and I will propose a repeated data splitting approach to mitigate the efficiency loss. The second approach concerns the existing methods for debiased inference. I will show that the debiasing approach is an extension of OLS to high dimensions, and that a careful bias analysis leads to an improvement to further control the bias. The comparison between these two approaches provides insights into their intrinsic bias-variance trade-off, and I will show that the debiasing approach may lose efficiency in observational studies.