Questions about Gradient descent

Short answers, pulled from the story.

When did Augustin-Louis Cauchy first suggest gradient descent?

Augustin-Louis Cauchy first suggested the method in 1847. He proposed taking repeated steps opposite to the gradient of a function at its current point.

What is stochastic gradient descent used for today?

Stochastic gradient descent serves as the most basic algorithm used for training most deep networks today. It adds a stochastic property to weight updates during backpropagation.

Why do high condition numbers cause slow convergence rates?

High condition numbers cause slow convergence rates in elongated level sets because residual vectors become orthogonal across iterations when solving linear systems. This orthogonality forces each new direction to undo the overshoot from the previous step.

Who proposed fast gradient methods for convex problems?

Yurii Nesterov proposed a simple modification enabling faster convergence for convex problems. His fast gradient method reduces error bounds significantly compared to standard approaches.