Deep Learning has had phenomenal empirical successes in many domains including computer vision, natural language processing, and speech recognition. To consolidate and boost the empirical success, we need to develop a more systematic and deeper understanding of the elusive principles of deep learning. In this talk, I will provide analysis of several elements of deep learning including non-convex optimization, overparametrization, and generalization error. Recent theoretical work has established connections between neural networks and linearized models governeed by Neural Tangent Kernels (NTK). Such theory leads to concrete convergence and generalization results, yet the empirical performance of neural networks are observed to exceed their linearized models. Towards closing this gap, we investigate the training of overparametrized neural networks that are more global than the NTK regime. We show that by utilizing more of the parameter space, SGD can learn with lower sample complexity than NTK under mild distributional assumptions.