Text classification

We want computers to tell one kind of text from another. Spam or not. Sports news or politics. That’s text classification. It’s the “sorting hat” of artificial intelligence (AI).

Naive Bayes

The old workhorse is Naive Bayes. She looks at words as if they act independently—naive, yes, but fast. If “lottery” shows up, she bumps up the odds of spam. Add “prize” and the score jumps higher. It works well because junk mail repeats itself. Simple math gets us a decent filter.

Logistic regression

Next step up is logistic regression. Instead of leaning on a naive independence rule, she learns weights for each word. “Free” might carry a heavy weight, “meeting” a lighter one. Each word nudges the needle until she spits out a probability. Clean, predictable, and still efficient. Think of it as Naive Bayes with better tailoring.

Transformers

Then transformers arrived. She doesn’t just see words; she sees how they lean on each other in context. “Free” in “feel free to call” is harmless. In “free money,” not so much. With attention, she learns the difference. That makes her strong not just for spam but for broader tasks: classifying news topics, tagging support tickets, sorting reviews.

What we learn

The pattern is clear: start simple, then add nuance. Naive Bayes for speed. Logistic regression when we want balance. Transformers when context matters. Each step moves closer to how we read.

A coder’s thought

We may never need more than a spam filter for our side project. But it’s nice to know the ladder goes higher. One day she may read everything better than we do. Until then, we’ll settle for less junk in our inbox.

Movie review classifier

We wrapped this up in a small app. Type a quick movie review—good, bad, or in between—and the app guesses the star rating. It’s a simple way to see how text classification works.

Behind the scenes, we embedded a few sample reviews with MiniLM, averaged each group into a centroid, and saved the results. When we classify something new, the AI turns your text into a vector and she picks the closest centroid. A tiny model, doing just enough to be useful.