Inductive biases from specific training algorithms like stochastic gradient descent play a crucial role in learning overparameterized models, especially neural networks. In this talk, I will use a simple linear regression setting to derive results that demonstrate (a) how gradient descent can implicitly introduce interesting inductive biases for the learning problem and (b) how different parameterizations of the same optimization objective can lead to very different learned models in overparameterized regimes – including models that cannot be learned by kernel methods. In the second part of the talk, I will talk about how even hyperparemters of the training algorithm like initialization and step size are crucial for determining the right inductive bias. In particular, I will use the same linear regression problem to show how scale of initialization dictates a transition between kernel like behavior to richer models that are very different from minimum RKHS norm models.