Standard machine learning produces models that are accurate on average but degrade dramatically on when the test distribution of interest deviates from the training distribution. We consider three settings where this happens: when test inputs are subject to adversarial attacks, when we are concerned with performance on minority subpopulations, and when the world simply changes (classic domain shift). Our aim is to produce methods that are provably robust to such deviations. In this talk, I will provide an overview of the work my group has done on this topic over the last three years. We have found many surprises in our quest for robustness: for example, that the ‘more data’ and ‘bigger models’ strategy that works so well for average accuracy sometimes fails out-of-domain. On the other hand, we have found that certain tools such as analysis of linear regression and use of unlabeled data (e.g., robust self-training) have reliably delivered promising results across a number of different settings.