This talk will survey the role played by margins in optimization, generalization, and representation of neural networks. A specific highlight will be that applying gradient descent with reasonable step sizes and initialization to a network whose width is polynomial in the inverse margin but merely polylogarithmic in other problem parameters (e.g., training set size) suffices to achieve arbitrarily small test error.