OpenAI GPT System: What does it do?

Two contrasting machine learning approaches to NLG: OpenAI GPTs and Arria NLG

OpenAI has released several Generative Pretrained Transformer (GPT) systems (GPT, GPT-2, GPT-3), which have received a lot of media attention and are often described as Natural Language Generation (NLG) systems. However, GPT systems are very different from the kind of NLG done at Arria.

While Arria systems analyze data and generate narratives based on this analysis, GPT systems (at least in their basic form) completely ignore numeric data. Instead, they use technology similar to autocomplete systems to expand an initial text fragment (which can be just a few words) into a complete narrative.

Because GPT does not look at data about what is actually happening in the world, the narratives it generates are often pieces of fiction which bear little resemblance to the real world. This means that GPT is not well-suited to generating reports in areas such as finance and medicine, where accuracy is of paramount importance.

We can see this by looking at an example. I typed the sentence below as an initial text fragment into the online version of GPT-2 (https://talktotransformer.com/):

“COVID-19 deaths have been falling for the past 2 months.”

GPT-2 expanded my initial sentence into the following narrative:

“COVID-19 deaths have been falling for the past 2 months. Still, the number is still unacceptably high when contrasted to the 100 deaths reported for December. Only 21 of the reported deaths (7.75%) were found to have been cancer.”

This result is typical of GPT output:

It reads reasonably well.
Everything it says (except for the first sentence, which I provided) is factually wrong. For example, there were no COVID-19 deaths in December.

GPT generates narratives using a “language model”, which is common practice in autocomplete systems. The language model looks at the text so far, and computes which words are most likely to come next, based on an analysis of word patterns in English.

In conventional autocomplete, this is used to predict only a few words. For example, if I type “I will call you” into Google Gmail, its autocomplete suggests that the next word will be “tomorrow”, because “I will call you tomorrow” is a very common phrase in emails. Of course, I don’t have to accept this suggestion; I can reject it if it is not what I intended to type.

In GPT, the language model generates several sentences, not just a few words. In this case, it has learned (using “deep learning” neural networks that have been trained on Internet texts) that, when an initial sentence in a narrative talks about a falling death rate, the most common second sentence says that the death rate is still too high. But at no point does GPT-2 look at actual data about COVID death rates. Therefore, the content it generates (e.g., “100 deaths reported in December”) is of the correct type but bears no resemblance to what actually happened.

Arria’s systems, in contrast, are used to communicate insights about real-life data. For example, Arria’s COVID-19 Interactive Dashboard (https://www.arria.com/covid19-microsoft/) produced the following narrative:

New York is currently reporting 385,142 cases and 30,939 fatalities. During the past seven days, new cases have increased by 4,250, which represents a 15% decrease over cases confirmed during the previous week (5,023). The seven-day rolling average is 607 confirmed cases. As cases decline, we are also seeing a decline in deaths. Week over week there has been a 2% decrease in deaths (359) compared to last week (368).

This is very different from the GPT text above! The GPT text is essentially a well-written piece of fiction about COVID-19, while the Arria text accurately presents key insights about the spread of COVID-19.

There are places where the GPT approach is probably useful, including some computer game and chatbot contexts. But it is not useful if the goal is to accurately communicate real-world insights about data.

About the author: Arria Chief Scientist, Prof. Ehud Reiter, is a pioneer in the science of Natural Language Generation (NLG) and one of the world’s foremost authorities in the field of NLG. He is responsible for the overall direction of Arria’s core technology development as well as supervision of specific NLG projects. He is Professor of Computing Science in the University of Aberdeen School of Natural and Computing Sciences. Visit his blog here.