Tool-using agents beyond APIs

We talk about APIs a lot because they’re clean. A nice menu of options. Call one, get what you need. Done. But when an AI learns to click around in a browser or poke at desktop software, the world gets messier—and more interesting.

Browser automation

Think of her driving a browser the way we do: tabs, forms, pop-ups, “where did that modal come from?” She can log in, scrape data, even buy tickets. The trick is she doesn’t need a hidden back door; she just follows the interface the way any of us would. That makes her powerful where APIs don’t exist, or worse, exist but lie.

It’s not foolproof. Layouts change. Buttons move. A “skip intro” link throws her off. Still, if she can see and act on the same screen we see, she can reach corners that used to be locked away.

Software automation

Software looks similar. She can launch apps, fill out forms, copy files. If it runs on clicks and keystrokes, she can usually figure out a script to repeat it. That’s boring for us, but heaven for her.

The catch is brittleness. A single version update can break the whole routine. Yet the payoff is she works where vendors never bothered to expose an API. Old accounting tools, proprietary dashboards, or that one HR system everybody hates.

Why it matters

These agents widen the playground. We no longer wait for companies to publish clean APIs. She can navigate the mess directly, same as we do. That’s liberating—and a little unsettling.

A coder’s note

As coders, we usually chase elegance. Watching her stumble through a login screen reminds us: sometimes brute force is enough. We’ll take messy progress over perfect architecture, at least for now.