GPT-3 is what non-technical people think of as the future of AI, but the trust is that the future is already here.
GPT-3 is a natural language processing (NLP) model created by OpenAI that’s made waves in the last couple of years due to its ability to read and produce natural language.
Simply put, Generative Pre-trained Transformer 3 (GPT-3) enables computers to do several tasks involving natural language, such as summarizing text, question and answer, and translation.
Natural language is simply the language we humans use, such as what you are reading now. The quality of text produced by GPT-3 was shocking even to experts when it was first unveiled in 2020.
The cool thing is that GPT-3 and similar models are powering the next generation of apps. OpenAI exposed the model via an API that you can sign up for. Developers can use this API to customize your model for your specific application.
Today, GPT-3 is often used for chat, content generation, music lyrics, and more. If you’re interested in learning more about its applications, I think you’ll like this OpenAI article about specific GPT-3 use cases people have discovered. I also recommend their recent article about the newest GPT-3 features.
The reason I’m covering the GPT-3 model is because it represents how far we’ve come in the NLP domain. For now, let’s dive into what you need to know.
At their core, language models are advanced versions of your phone’s auto-complete functionality. They are trained on large amounts of text and asked to predict the next word in a sequence.
For example, take this clip from a news article:
Confidence is growing that a winter storm with the intensity of a hurricane, snow measured in feet and blizzard-like conditions will impact major Northeast cities this weekend.
"The models continue to show a nor'easter with blockbuster potential for the weekend, mainly late Friday through Saturday," CNN meteorologist Brandon Miller says.
We would train the model by breaking the words up like this, with the last column being the correct word to predict:
Repeat this over billions of words (Wikipedia, web scrapes, Twitter, public domain books, etc.), and you have a language model.
Obviously, there are some limitations to this approach. In the last row, if the model were to guess “typhoon” instead of “hurricane,” it would be penalized. It would also be penalized the same amount if it guessed something nonsensical like “donut,” even though “typhoon” is a much closer guess than “donut.”
That’s because the simple model doesn’t understand the meaning of words (semantics).
To get around that, we use word embeddings. Let’s say we put a bunch of words on a chart. We could group words like hurricane and typhoon together because they are similar.
The words donuts and bakery would be close together, but far away from hurricane and typhoon. We would convert words to their coordinates.
Hurricane, typhoon, donut, and baker would become (12,13), (12,14), (45,60), and (46,61). It then becomes simple math to measure the distances between the words.
These coordinates (embeddings) are a lot more meaningful to the model and help it understand the relationship between words. You can get a more in-depth explanation here.
Another limitation is context. Take this example:
If you were to look at this fragment on its own, you would be hard pressed to guess what word came next in the last column. If I told you the title of the article was, “A bomb cyclone with the power of a hurricane will unleash snow and blizzard-like conditions this weekend,” that would make guessing the word a lot easier.
The naive model doesn’t understand context. It only understands the seven words leading up to it. This is a really tricky problem, and most of the complexity in language models comes from trying to address it.
The thing that sets GPT-3 part isn’t necessarily the code or architecture. You could look up Python code on the internet and make your own version in a day or two.
What sets it apart is its size. The GPT-3 model alone is estimated to be around 350 GB in size. That makes it one of the largest machine learning models ever produced.
It pushes the limits of what our current hardware is capable of. It is estimated that OpenAI spent between $3 and $7 million just on AWS compute fees to train it.
As a rule of thumb, larger models require more data to train on. GPT-3 trained on text from millions of webpages, all of Wikipedia, and thousands of books.
One important thing to remember is that, despite its power, GPT-3 does not truly “understand” text. As you read through some of its outputted text, you are reminded of a college student who writes an essay on a book he or she did not read.
On first glance, the essay looks good, with proper grammar and sentence flow. But on second glance, you see the inconsistencies, mistakes, and contradictions because the college student really didn’t understand the topic.
For example, take this text produced by GPT-3. The input asked the model to write about unicorns. This was the output:
“The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. . .
Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.
[Examples from this OpenAI blog post.]
The text flows well, but we get some weird statements, like the one about “four-horned unicorns” (an animal with four horns by definition can’t be a unicorn.) And how can you reach the peak of a mountain while descending into a valley?
When you think about the future of natural language processing, GPT-3 is really fun to follow, and this is an article I recommend that shows many examples to help you imagine the possibilities.
I’ll be watching to see what other advancements there will be in this exciting and innovative space, and I’ll keep you posted.