Blog & News

This 22-year-old book on NLG is still relevant

By Ehud Reiter | July 6, 2022
buliding natural language generation systems

Back in 2000, I wrote a book on Building Natural Language Generation Systems (co-authored with Robert Dale, who worked for Arria in its early years). Many people, including Arria staff and clients, have told me that they still find my book to be useful despite its age, which is pretty remarkable in AI!  Most AI books lose their relevance in five years (sometimes two years), so it’s a nice feeling that my book is still useful 22 years after it was published.

Of course, much of the book is out-of-date. Below are my thoughts on which parts of the book are still relevant in 2022.

Title – the book is called “Building Natural Language Generation Systems,” however it focuses on data-to-text (i.e. the kind of NLG that Arria does). In the present-day context, in which NLG is a better-known technology, the book should be called “Building Data-to-Text Systems”. Indeed, I suspect that a 2022 book on NLG would discuss text-to-text as well as data-to-text applications.

Chapter 1: Introduction. This chapter describes the basic concepts of data-to-text, which have not changed. It also describes several examples of NLG systems. These example systems are dated, and a book published in 2022 would probably include examples from Arria’s key use cases, such as financial services and sports. But still, the examples from 2000 do show what NLG can do.

Chapter 2: NLG in practice. This chapter discusses when NLG makes sense, and how to use corpus analysis to understand requirements; this material is still valid and indeed used by Arria. A 2022 book on NLG would probably say much more about evaluation and testing, and also about commercial usage of NLG technology.

Chapter 3: Architecture. This chapter presents the basic document planning, microplanning, and realization architecture. I still believe in this architecture, which is used in many Arria systems. I also note that people who initially try something different (e.g. end-to-end neural NLG) often seem to drift back to something like this architecture. Because it works!

Chapter 4: Document Planning. This chapter presents basic concepts of document planning – how to assemble sentences into a narrative. The concepts haven’t changed since 2000, and indeed many of the techniques presented here are still valid and in use by Arria.

Chapter 5: Microplanning. This chapter presents basic concepts of microplanning – making good linguistic choices in narratives. The concepts haven’t changed much since 2000. The technology, however, has advanced; although the techniques presented in the book are still useful in some contexts (and indeed used by some Arria systems).

Chapter 6: Surface Realisation. This chapter presents basic concepts of realization, – producing grammatically correct texts. I think this is the most dated chapter; some of the conceptual stuff still applies, but none of the realizers described in the book are still used in 2022.

Chapter 7: Beyond Text Generation. This chapter looks at multimodal generation – generating documents that include graphics as well as words, and also generating spoken as well as written output. This is fine as far as it goes, but the book says nothing about dialogue and chatbot systems (like Arria Answers), which from a 2022 perspective is a major omission!

Summary. Overall, I think there still is a lot of value in my 2000 book, despite its age. I advise anyone reading the book in 2022 to focus on the conceptual material, most of which is still valid even though technology has advanced.

To learn more about natural language generation, download the fact sheet.