β¦ and yet they do. π€β¨
Deep learning relies on highly complex loss functions. Algorithms should optimize them, but we know they canβt. So why are the results still so impressive?
Our new paper offers a mathematical explanation. ππ§ We rigorously prove that deep-learning algorithms donβt actually need to find the true optimum. Being close to a local optimum is already enough. In nerdier terms: We show that every reasonable stationary point of certain neural networks β and all points nearby β generalize essentially as well as the global optimum. ππ
The paper has been accepted at TMLR! π Find it here.
Huge congratulations to Mahsa and Fangβtwo rising stars in machine learning. ππ