Gradient descent

We need a way to teach AI without writing every rule ourselves. Gradient descent is how we do it. It’s the method that helps her learn by trial, error, and adjustment—like finding the bottom of a valley in the dark with just our hands out.

The slope we follow

First step is figuring out which way is downhill. That’s the gradient. We calculate it by taking derivatives of the model’s error. Big slope means we’re far from the answer. Small slope means we’re close. If we don’t check the slope, we’re just stumbling around.

How big a step

Once we know direction, we decide how big the step should be. That’s the learning rate. Too big and we overshoot; too small and we crawl forever. It’s the dial we keep fiddling with until she moves at a pace that’s fast but steady.

The loop that never ends

The recipe is plain: calculate the gradient, take a step, repeat. Each loop lowers the error a little more. With enough loops, she settles into a spot that’s close enough to “best” for practical use. The trick is stopping before she starts circling the drain.

Architectures waiting to learn

Neural networks are just layers of math, stacked and wired. Gradient descent is how those layers tune themselves. She doesn’t memorize answers; she shapes her inner connections so they line up with patterns in data. It looks like magic until we remember it’s just slopes and steps.

Our takeaway

We keep coding like this: small rules, steady loops, and patience. Watching her adjust weights one nudge at a time feels slow, but then—suddenly—she’s good at something we never explained. That’s the moment we realize the valley has a bottom, and we’ve helped her find it.