Celebrating the 45th Anniversary of the University of Macau:

FBA-APAEM JOINT SEMINAR SERIES

Deep Learning with Missing Data

Prof. Richard SAMWORTH
Professor
Department of Pure Mathematics and Mathematical Statistics
University of Cambridge
United Kingdom

Date: 27 March 2026 (Friday)
Time: 10:00-11:30
Venue: E22-G004
Host: Prof. Wenyang ZHANG, Chair Professor of Business Intelligence and Analytics

Abstract

In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions. Our main theoretical result exploits an assumption that the observation patterns can be partitioned into cells on which the Bayes regression function behaves similarly, and belongs to a compositional Hölder class. It provides a finite-sample excess risk bound that holds for an arbitrary missingness mechanism, and in combination with a complementary minimax lower bound, demonstrates that our PENN estimator attains in typical cases the minimax rate of convergence as if the cells of the partition were known in advance, up to a poly-logarithmic factor in the sample size. Numerical experiments on simulated, semi-synthetic and real data confirm that the PENN estimator consistently improves, often dramatically, on standard neural networks without pattern embedding. Code to reproduce our experiments, as well as a tutorial on how to apply our method, is publicly available.

Speaker

Prof. Richard SAMWORTH obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a full professor in 2013 and the Professor of Statistical Science in 2017.  His main research interests are in nonparametric and high-dimensional statistics, as well as the statistical foundations of AI; he has developed methods and theory for shape-constrained inference, missing data, subgroup selection, deep learning, data perturbation techniques, changepoint estimation, variable selection and independence testing.  Richard received the COPSS Presidents’ Award in 2018, was elected as a Fellow of the Royal Society in 2021 and was awarded the David Cox Medal for Statistics in 2025.  He served as co-editor of the Annals of Statistics (2019-2021) and is currently IMS President-Elect.

All are welcome!