Blog & News

NLG in the newsroom: fast, consistent, and hyperlocal

By Greg Williams | August 21, 2019

We’ve written about how Natural Language Generation can eliminate the bottleneck of manual, one-at-a-time analysis within business environments, producing data-driven insights that otherwise would remain fossilized in spreadsheets on the network drive. It is also worth noting that the same principles apply to the world of news reporting, where there is so much valuable data to consider that editors—when assigning stories based on limited human analytical capacity—are forced to leave many data sets completely unexamined. Editorial prioritization tends to mirror demand, leading to the omission of hyperlocal content that is highly useful, but to only a small subset of readers. A Natural Language Generation platform—particularly when it is open, extensible, smart, and secure (like ours!)—solves this problem.

This is not a theoretical statement, but one that is actively illustrated today by UK-based RADAR AI, and by BBC News Labs. Both organizations are using Arria’s NLG platform to publish high-quality stories that otherwise would simply never be written.

RADAR AI Hits a Milestone: 200,000 Stories

RADAR (an acronym for Reporters and Data and Robots) is a joint venture between data-journalism start-up Urbs Media and the Press Association, the UK national news agency. Just last week, after being open for business for only fourteen months, RADAR AI published its 200,000th story: “Crown court waiting times increase by more than seven months in Newcastle,” by Harriet Clugston, Data Reporter. (Nice job title. Notice how effectively it stakes out Clugston’s data-driven approach while also establishing her beat as essentially boundary-free. With a title like that, a journalist can use any data set as a starting point for investigation and explanation.)

Relatively few in the world are concerned about Crown Court waiting times in Newcastle, a city of approximately 300,000 as of 2018. In fact, we can guess that relatively few of the 300,000 citizens of Newcastle are concerned about the Crown Court waiting times. Generally speaking, our interest in this subject is proportional to the degree of our civic or personal interest in the municipality or the court. We’re mildly interested if we’re paying taxes to keep the system running, but probably not highly interested unless ours is one of the 455 cases waiting to be heard. For understandable, practical reasons, an editor probably would not have assigned a reporter to analyze how caseloads and waiting times have changed over the years, and then to write a story describing movements in the data. A subject of broader interest would win the day and the Crown Court story—which does in fact contain valuable insights for those who are interested in the topic—would never have appeared.

Fortunately the RADAR approach “breaks the ‘content compromise’ which forces organisations to choose between high quality, reliable and bespoke content or mass-produced superficial output.” Makes perfect sense. Does an editor really want Harriet Clugston to spend her time crunching and describing numbers? No, Arria’s NLG platform can do most of that for her, freeing her to perform a level of reporting that makes the hyperlocal piece read like a story of broader interest. In the brief, information-packed article, Clugston includes direct quotations from four individuals who are involved in the UK criminal justice system:

  • Stephanie Boyce, Deputy Vice President of the Law Society of England and Wales;
  • John Apter, Chair of the Police Federation of England and Wales;
  • Sara Glenn, Deputy Chief Constable for Hampshire Constabulary; and
  • Spokeswoman for HM Courts and Tribunals Service.

With NLG in an assisting role, Clugston has taken the opportunity to maximize the value of her story to the few citizens of Newcastle who are fretting about longer wait times at Crown Court.

Let’s do the math on those 200,000 stories that RADAR has published since opening for business in June of 2018. That is a rate of approximately 13,300 stories per month, 430 per day. Not bad for an organization that has only seven employees on Linkedin!

For comparison, by its own reckoning The New York Times—which employs approximately 1,300 staff writers—publishes roughly 200 “pieces of journalism” per day, with “pieces of journalism” likely including blog posts and interactive graphics, in addition to stories. Following is an excerpt from a 2017 internal report created by the “the 2020 group” of Times editors tasked with spending the prior year examining editorial policies and practices at the paper:

The Times publishes about 200 pieces of journalism every day. This number typically includes some of the best work published anywhere. It also includes too many stories that lack significant impact or audience—that do not help make The Times a valuable destination.”

A couple of paragraphs later, the report states the problem even more plainly: “We devote a large amount of resources to stories that relatively few people read…. It wastes time—of reporters, backfielders, copy editors, photo editors, and others—and dilutes our report.”

It would appear that RADAR has found a way to address these concerns. A story about Crown Court waiting times probably lacks a significant audience, but does have a significant impact on a small audience. Especially if its appearance or delivery can be targeted to readers most likely to be interested, a hyperlocal story such as this one represents a step in the right direction rather than the dilution of “some of the best work published anywhere.”

The Newcastle story is available to us only as a screenshot, but here is another recent story from Harriet Clugston to which all of the observations above are applicable: NHS staff took almost 7,000 full-time days of sick leave because of drug or alcohol abuse last year, figures reveal.

BBC News Labs and SALCO Part 1

The BBC, too, facing heightened expectations for the frequency and quality of local news content, has commenced a ‘Semi-automated Local Content’ initiative, SALCO for short. BBC News Lab developers Roo Hutton and Tamsin Green began by developing pipeline that reported Accident and Emergency statistics on more than one hundred local hospitals—interesting information but, again, not the best use of top-notch journalists’ time. As Hutton explains in his excellent article from March of this year, “Stories by numbers: How BBC News is experimenting with semi-automated journalism,” “Automated journalism isn’t about replacing journalists or making them obsolete. It’s about giving them the power to tell a greater range of stories— whether they are directly publishing the stories we generate, or using them as the starting point to tell their own stories— while saving them the time otherwise needed to analyse the underlying data.”

Hutton describes a respectful, cooperative approach during which he and Green work closely with journalists in order to learn how they think, and also to help the journalists understand how Arria’s platform works, and why traditional writers should embrace NLG.

“This story has been generated using BBC election data and some automation”

The experiment was such a success that BBC News Labs decided to take the same approach to covering local elections in May of this year. Writing on the News Labs site a couple of weeks ago in, “Salco part 2: Automating our local elections coverage,” Tamsin Green explains the rationale, both in terms of workload volume and the need for consistency in coverage: “Local elections on BBC News Online are covered at both a national level, to aggregate results and highlight trends, as well as locally by journalists working out of regional hubs. With 248 councils up for election in England alone, that means a huge amount of journalism in a short period of time. We observed huge variation in election coverage across the country: Some councils were not covered. Some would simply take tweets from the @bbcelection Twitter feed. Others were there at the count, posting photographs and detailed results.”

This is a textbook case for NLG, and the sample output from BBC News Labs looks good. As you contemplate it, consider the variance in style and content that would naturally arise if reporters were left to configure the information themselves from municipality to municipality—and consider how long it would take to assemble even a hodgepodge of inconsistent reporting.