Conventional regularization techniques for neural networks, such as L2 or L1 regularization, explicitly penalize divergence of the model parameters from specific parameter values. However, in most neural network models, specific parameter configurations bear little to no physical meaning, and it is difficult to incorporate domain knowledge or other relevant information into neural network training using conventional regularization techniques.

In this talk, I will show that we can address this shortcoming by using Bayesian principles to effectively incorporate domain knowledge or beliefs about desirable model properties into neural network training. To do so, I will approach regularization in neural networks from a probabilistic perspective and define a family of data-driven prior distributions that allows us to encode useful auxiliary information into the model. I will then show how to perform approximate inference in neural networks with such priors and derive a simple variational optimization objective with a regularizer that reflects the constraints implicitly encoded in the prior. This regularizer is mathematically simple, easy to implement, and can be used as a drop-in replacement for existing regularizers when performing supervised learning in neural networks of any size.

I will conclude the talk with an overview of applications of data-driven priors, including distribution shift detection and medical diagnosis.

This is joint work with Sanyam Kapoor, Shikai Qiu, Xiang Pan, Lily Yucen Li, Ya Shi Zhang, Ravid Shwartz-Ziv, Julia Kempe, and Andrew Gordon Wilson.