Classic word embeddings
Let’s be honest: text is messy. Computers don’t understand words the way we do—they just see strings of characters. If we want them to do something useful with language (like find similar words or spot meaning), we need a trick: turning words into numbers.
That’s where word embeddings come in.
Think of embeddings as a way to put words on a map. Instead of just knowing that “dog” and “cat” are different spellings, embeddings help a computer notice that they’re closer to each other than they are to, say, “banana.”
Word2Vec: The Big Idea
Back in 2013, Google researchers gave us Word2Vec, which quickly became a breakthrough. The trick was simple but clever: instead of memorizing word definitions, Word2Vec taught computers to guess the neighbors of a word.
Here’s the secret:
- Words that show up in similar contexts probably mean similar things.
- If “coffee” often appears near “cup” or “mug,” those words probably belong together.
By learning these patterns, Word2Vec places each word in a dense vector space. Suddenly, “king - man + woman ≈ queen” wasn’t just a party trick—it showed embeddings could actually capture meaning.
Skip-gram: Predicting the Surroundings
One way Word2Vec works is the Skip-gram model.
- You start with a single word (say, “apple”).
- The model’s job is to predict the words that might appear around it (“pie,” “tree,” “fruit”).
It’s like tossing a word into a pond and watching the ripples—Skip-gram looks outward.
CBOW: Guessing the Missing Word
The opposite idea is Continuous Bag of Words (CBOW).
- Here, you feed the model the context words (“she ate an \\\_ for breakfast”).
- The goal is to predict the missing middle word (“apple”).
Instead of ripples moving outward, it’s like filling in the blank.
Why It Matters
These classic methods might feel old-school now, especially with modern transformers running the show. But Word2Vec, Skip-gram, and CBOW were huge leaps. They made text searchable, comparable, and computable in ways that laid the foundation for today’s AI models.
Without them, we wouldn’t have the fancy contextual embeddings (like BERT or GPT) that make chatbots and semantic search work today.
In short: Classic word embeddings are the “aha!” moment where words became numbers—and numbers started to mean something.