Quaintitative

Gradient Descent

· 3 min read
ai fundamentals

6 months ago I scribbled on a post-it - “Don’t force it. Let things flow.” I was 3 months away from leaving MAS. I think anxiety had started creeping in.

I wasn’t sure I believed it then. I’m not sure I fully believe it now. And somehow I realized it links to the gradient descent method in deep learning.

Week 13 of life post-MAS. (Links to past weeks in my newsletter.)

The familiar. Recording a course introduction for the module I did for Cambridge Centre of Alternative Finance. A veteran model risk head I’ve crossed paths with before. Someone at a ratings agency I’ve been chatting with - introduced her to the CEO of an AI testing startup. A founder building an agentic AI risk and compliance startup. An AI development head I worked with before.

New connections. An insurance association. C-suites from wealth management and family offices, gathered around a table to talk about AI. Folks convened by a training institution, trying to design something for the next generation.

A relatively light week. But somehow that phrase - “Don’t force it. Let things flow.” - came to mind. To explain why, I need to go back. And sideways. Into one of the most important ideas in the history of AI.

The algorithm that made everything possible, in my view

Gradient descent is not new. The mathematical foundations go way back. But in the 1980s, Rumelhart, Hinton, and Williams showed how gradient descent could train multi-layer neural networks through backpropagation.

The idea. Think of a hilly landscape, with many peaks and valleys. Your aim is the bottom of the deepest valley - the global minimum. You can’t see the whole landscape. You can only feel the slope beneath your feet. So you take a small step downhill. Then another. And another. Each step guided by the gradient - the direction in which the ground is falling most steeply. Slowly, you descend.

But here’s the catch. Sometimes you wander into a small dip that feels like the bottom. Every direction around you slopes upward. So you stop. But you’re not at the deepest point. You’re just at a local minimum - a valley within a valley. The algorithm can’t tell the difference. Neither can you, standing inside it.

Forcing things (larger steps, running) does not help. Instead you want to go with the natural flow of the landscape.

I still remember learning it, a decade or more ago. And being happy that there was a way to solve equations without finding a closed form solution. But I digress.

The misses

Thirteen weeks out. And there have been misses.

Directions tried and abandoned. Something agreed to right at the start that felt wrong the moment it became concrete. It would compromise my freedom. I walked away from it. Rooms entered with carefully prepared lines. The rooms didn’t want them. Conversations that started with promise and ended quietly. Proposals that sat unanswered. Sitting in a room and realizing that I had no interest doing something that seemed polished, but which I knew was trivial.

Each one felt like a small setback at the moment.

But I now think that’s not what they were.

In gradient descent, a wrong step isn’t failure. It’s information. It tells you the slope. It shifts the gradient. Without the wrong steps, you cannot know which direction is right. The misses weren’t dead ends. They were the point.

The wins

And things that now seem to be pointing me in the right direction.

Something built earlier generated an inbound. Folks I met earlier pulling me into pieces of work that are substantive and make sense. Things confirmed without pitching. Rooms that worked not because the lines were perfect but because I stopped bringing my preconceptions. Contracts finalizing. Interesting collaborations forming.

More arrived this week than I pursued. Perhaps earlier misses had helped me to flow towards somewhere with less resistance.

That’s what flow is, I think. Not passivity. Not luck. It’s just gradient descent. The steps make more sense because of both the explorations and the misses.

But here’s what gradient descent cannot guarantee

A global minimum. Something that’s optimal.

In gradient descent, you follow the slope downward. You correct. You adjust. You flow. And you might arrive somewhere that looks exactly like the optimal place to be in. But it’s a local minimum. And you stop, without knowing that a few steps more might have taken you to a better place - the global minimum.

But I guess that’s life.

#GradientDescent #AIRiskManagement #Transitions #Reflections #Flow