Roger Grosse - Studying Neural Net Generalization through Influence Functions


How can we trace surprising behaviors of machine learning models back to their training data? Influence functions aim to predict how the trained model would change if a specific training example were added to the training set. I’ll address two issues that have blocked their applicability to large-scale neural nets: apparent inaccuracy of the results, and the difficulty of computing inverse-Hessian-vector products. Towards the former issue, I’ll reformulate the goals of influence estimation in a way that applies to overparameterized, incompletely trained models, and argue that the apparent inaccuracy was largely illusory. I’ll then discuss an approach to scaling influence estimation to large language models and show some resulting insights into their patterns of generalization.

SEC 1.413