Named entity recognition

We need a way to teach AI what matters in a wall of text. Named Entity Recognition (NER) is how we do it. Think of it as underlining the names, places, and dates so she knows where to look. Without this step, she’s just guessing.

Finding the entities

AI starts by scanning text word by word. Then she tags chunks as entities: people, locations, organizations, dates, numbers. Each chunk gets a label. “Alice went to Paris in 2024” turns into Alice = Person, Paris = Location, 2024 = Date. Not magic—just pattern spotting, trained on lots of examples.

Why categories matter

If everything were just “stuff,” we’d never get useful answers. Categories tell her which bits belong together. A bank name shouldn’t be confused with a river. A date isn’t just another number. Simple rule: define clear buckets, so the model sorts reliably.

Sequence labeling

NER isn’t just plucking words out. It’s sequence labeling—deciding for every token whether it starts an entity, continues one, or is outside. That’s why “New York City” comes back whole, not chopped into “New,” “York,” and “City.” We label the sequence so context survives.

Small wins add up

Do this right, and the downstream tasks suddenly click. Search engines highlight the right snippets. Chatbots understand who we mean. Analytics stop looking like scrambled notes. It’s the boring-sounding layer that makes the clever layers possible.

Our coder’s musing

We keep waiting for AI to act like a mind reader. Instead, she’s more like a meticulous note-taker. If we train her to underline the right names and dates, we’ll spend less time cleaning up her homework.

Named entity recognizer

We built a small named entity recognizer. Type a sentence and our AI underlines the important bits—names, places, and groups—so she knows what you’re talking about. Don’t expect her to catch dates—this tiny model isn’t that ambitious.

We used a lightweight language model and a simple tokenizer to break the sentence into pieces she can understand. Then we ran those pieces through a small ONNX model that marks which words look like people, locations, or organizations.