Image classification

We want computers to see pictures and know what they are. That’s the whole idea of image classification. It sounds obvious, but it’s not.

A pile of numbers

A digital image is just numbers in a grid. Each number is a shade of gray or a color value. Nothing magical. We hand this grid to artificial intelligence (AI), and she has to figure out whether it’s a cat, a stop sign, or the number 7.

The training playground

There’s a classic dataset called MNIST. It’s 70,000 pictures of handwritten digits, all cleaned up to the same size. Think of it as a playground where she learns to recognize numbers. We like it because it’s small, quick, and shows us if our code works.

Why simple rules fail

We can’t just say “if there’s a loop on top, it’s an 8.” People write messily. Angles tilt. Lines break. Our early rule-based tricks fail fast. We need her to generalize—to spot the “2-ness” even when the curve is sloppy.

Convolutional networks

Enter convolutional neural networks (CNNs). They’re built to look at little patches of an image and stitch meaning together. One layer notices edges. Another layer notices corners. Later layers get ambitious: eyes, wheels, digits. She works through the mess in stages until she says, “this is a 5.”

A coder’s musing

We train her on MNIST, then we point her at real-world photos, and suddenly the job feels huge. Dogs and cats look alike when blurred. Humans barely manage. So the wonder is not that she makes mistakes, but that she gets so much right at all.