{"id":243,"date":"2020-07-29T14:21:34","date_gmt":"2020-07-29T18:21:34","guid":{"rendered":"https:\/\/arriablog22.wpengine.com\/choosing-words-to-describe-data\/"},"modified":"2022-03-24T15:26:11","modified_gmt":"2022-03-24T19:26:11","slug":"choosing-words-to-describe-data","status":"publish","type":"post","link":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/","title":{"rendered":"Choosing words to clearly describe data"},"content":{"rendered":"<p>People often ask me which words an NLG system should use to describe numbers. For example, if profits rose by 3%, should this be described as a \u201cslight rise\u201d or a \u201cmoderate rise\u201d? Using words to communicate numbers can make narratives easier to read and understand, especially for readers who are \u201cverbal\u201d thinkers.<!--more--><\/p>\n<p>Unfortunately, it is difficult to give crisp rules for when a particular word such as \u201cmoderate\u201d should be used, because different people interpret such words in different ways. Many years ago, Yaji Sripada and I built an NLG weather forecast generator (as a university research project, not as an Arria project), and we did a lot of investigation of how both writers and readers use words that expressed data, especially time phrases such as \u201cby evening\u201d. We discovered that:<\/p>\n<ul>\n<li>Some people thought \u201cby evening\u201d meant 1800, others thought it meant 2100, and still others thought it meant 0000.\n<\/li>\n<li>A few people thought that the meaning of \u201cby evening\u201d depended on either sunset time, or meal time; most thought meaning was not influenced by these factors.<\/li>\n<\/ul>\n<p>If a weather forecast is going to be read by lots of people, and these people interpret \u201cby evening\u201d in different ways, then the NLG system clearly has a problem! For example, the forecasts that Yaji and I looked at were for offshore oil rigs, and we were concerned that if a forecast said that heavy winds were expected \u201cby evening\u201d meaning 1800, but the rig staff thought \u201cby evening\u201d meant 0000, the rig staff might plan to unload a supply boat at 2100, which would not be good.<\/p>\n<p>We also discovered that some phrases were less ambiguous than others, for example pretty much everyone agreed that \u201cby midnight\u201d meant 0000. So we ended up recommending that NLG-system builders look for words that are interpreted in the same way by most people, and use these wherever possible. This strategy worked well, and indeed human readers in many cases preferred our computer-written forecasts over human-written forecasts, in part because wording (as above) was more consistent and less confusing. If anyone wants to know more about this work, you can read our research paper <a href=\"https:\/\/doi.org\/10.1016\/j.artint.2005.06.006\" rel=\"noopener\" target=\"_blank\">\u201cChoosing words in computer-generated weather forecasts\u201d.<\/a><\/p>\n<p>I have seen many other cases where different people interpret words differently; I discuss some of these in another paper <a href=\"https:\/\/doi.org\/10.1162\/089120102762671981\" rel=\"noopener\" target=\"_blank\">\u201cHuman variation and lexical choice\u201d.<\/a> Other people have also observed this; for example the psychologist Dianne Berry <a href=\"https:\/\/doi.org\/10.1111\/j.2042-7174.2002.tb00602.x\" rel=\"noopener\" target=\"_blank\">has observed<\/a> that in the context of communicating risks of side effects of medication, words like \u201ccommon\u201d are interpreted very differently by different people. Berry also pointed out that if the word \u201ccommon\u201d is used to indicate a 1-10% chance of a nasty side effect (which is recommended usage in the EU for medication leaflets), but a patient thinks that \u201ccommon\u201d means a 50% chance (which is how many of Berry\u2019s subjects interpreted the word), then the patient might refuse to take the medication because of this misunderstanding.<\/p>\n<p>I suspect that people who work closely together will probably tend to agree on word meanings, because of a psycholinguistic phenomenon known as \u201calignment\u201d. But if a narrative is intended for a wide audience, then choosing words that are interpreted consistently by readers may not be straightforward.<\/p>\n<p>My recommendation is to do a careful analysis in order to understand how different users will interpret different words. Don\u2019t just rely on your intuition, because your intuition will tell you how you interpret words, not how other people interpret words. I would start with a questionnaire that asks people about their interpretation of specific words. Ideally you should also do a \u201ccorpus analysis\u201d where you analyze how words are used in existing texts; Yaji and I did this when we analyzed word usage in weather forecasts. However, while corpus analysis provides excellent data on word usage, doing a good corpus analysis is a fair amount of work and also requires some specialist skills.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>About the author:<\/strong><span>&nbsp;Arria Chief Scientist, Prof. Ehud Reiter, is a pioneer in the science of Natural Language Generation (NLG) and one of the world\u2019s foremost authorities in the field of NLG. He is responsible for the overall direction of Arria\u2019s core technology development as well as supervision of specific NLG projects. He is Professor of Computing Science in the University of Aberdeen School of Natural and Computing Sciences.&nbsp;<\/span><a href=\"https:\/\/ehudreiter.com\/\" rel=\"noopener\" target=\"_blank\">Visit his blog here.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>People often ask me which words an NLG system should&#8230;<\/p>\n","protected":false},"author":10,"featured_media":1119,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[43,38,42,14,29,33],"class_list":["post-243","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-data-analysis","tag-data-interpretation","tag-language","tag-nlg","tag-nlg-blog","tag-prof-ehud-reiter"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Choosing words to clearly describe data - NLG Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Choosing words to clearly describe data - NLG Blog\" \/>\n<meta property=\"og:description\" content=\"People often ask me which words an NLG system should...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/\" \/>\n<meta property=\"og:site_name\" content=\"NLG Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-07-29T18:21:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-24T19:26:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"790\" \/>\n\t<meta property=\"og:image:height\" content=\"334\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Ehud Reiter\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ehud Reiter\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/\"},\"author\":{\"name\":\"Ehud Reiter\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#\\\/schema\\\/person\\\/31224a0ca3829f43a13781b3d7afd7e0\"},\"headline\":\"Choosing words to clearly describe data\",\"datePublished\":\"2020-07-29T18:21:34+00:00\",\"dateModified\":\"2022-03-24T19:26:11+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/\"},\"wordCount\":729,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/BL130-EhudWords-1.jpg\",\"keywords\":[\"data analysis\",\"Data Interpretation\",\"Language\",\"NLG\",\"NLG Blog\",\"Prof Ehud Reiter\"],\"articleSection\":[\"NLG Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/\",\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/\",\"name\":\"Choosing words to clearly describe data - NLG Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/BL130-EhudWords-1.jpg\",\"datePublished\":\"2020-07-29T18:21:34+00:00\",\"dateModified\":\"2022-03-24T19:26:11+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/BL130-EhudWords-1.jpg\",\"contentUrl\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/BL130-EhudWords-1.jpg\",\"width\":790,\"height\":334,\"caption\":\"BL130-EhudWords\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/choosing-words-to-describe-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Choosing words to clearly describe data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/\",\"name\":\"NLG Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#organization\",\"name\":\"NLG Blog\",\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/arria_logo_125x30.png\",\"contentUrl\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/3\\\/2022\\\/03\\\/arria_logo_125x30.png\",\"width\":125,\"height\":30,\"caption\":\"NLG Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/#\\\/schema\\\/person\\\/31224a0ca3829f43a13781b3d7afd7e0\",\"name\":\"Ehud Reiter\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g\",\"caption\":\"Ehud Reiter\"},\"url\":\"https:\\\/\\\/www.arria.com\\\/blog\\\/author\\\/ehud-reiter\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Choosing words to clearly describe data - NLG Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/","og_locale":"en_US","og_type":"article","og_title":"Choosing words to clearly describe data - NLG Blog","og_description":"People often ask me which words an NLG system should...","og_url":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/","og_site_name":"NLG Blog","article_published_time":"2020-07-29T18:21:34+00:00","article_modified_time":"2022-03-24T19:26:11+00:00","og_image":[{"width":790,"height":334,"url":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg","type":"image\/jpeg"}],"author":"Ehud Reiter","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Ehud Reiter","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#article","isPartOf":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/"},"author":{"name":"Ehud Reiter","@id":"https:\/\/www.arria.com\/blog\/#\/schema\/person\/31224a0ca3829f43a13781b3d7afd7e0"},"headline":"Choosing words to clearly describe data","datePublished":"2020-07-29T18:21:34+00:00","dateModified":"2022-03-24T19:26:11+00:00","mainEntityOfPage":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/"},"wordCount":729,"commentCount":0,"publisher":{"@id":"https:\/\/www.arria.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg","keywords":["data analysis","Data Interpretation","Language","NLG","NLG Blog","Prof Ehud Reiter"],"articleSection":["NLG Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/","url":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/","name":"Choosing words to clearly describe data - NLG Blog","isPartOf":{"@id":"https:\/\/www.arria.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#primaryimage"},"image":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg","datePublished":"2020-07-29T18:21:34+00:00","dateModified":"2022-03-24T19:26:11+00:00","breadcrumb":{"@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#primaryimage","url":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg","contentUrl":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/BL130-EhudWords-1.jpg","width":790,"height":334,"caption":"BL130-EhudWords"},{"@type":"BreadcrumbList","@id":"https:\/\/www.arria.com\/blog\/choosing-words-to-describe-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.arria.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Choosing words to clearly describe data"}]},{"@type":"WebSite","@id":"https:\/\/www.arria.com\/blog\/#website","url":"https:\/\/www.arria.com\/blog\/","name":"NLG Blog","description":"","publisher":{"@id":"https:\/\/www.arria.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.arria.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.arria.com\/blog\/#organization","name":"NLG Blog","url":"https:\/\/www.arria.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.arria.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/arria_logo_125x30.png","contentUrl":"https:\/\/www.arria.com\/blog\/wp-content\/uploads\/sites\/3\/2022\/03\/arria_logo_125x30.png","width":125,"height":30,"caption":"NLG Blog"},"image":{"@id":"https:\/\/www.arria.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.arria.com\/blog\/#\/schema\/person\/31224a0ca3829f43a13781b3d7afd7e0","name":"Ehud Reiter","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b2b51dd76543bef69265c1b1b8d995a0132ea071f50988250c00fdea10b15bf9?s=96&d=mm&r=g","caption":"Ehud Reiter"},"url":"https:\/\/www.arria.com\/blog\/author\/ehud-reiter\/"}]}},"_links":{"self":[{"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/posts\/243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/comments?post=243"}],"version-history":[{"count":0,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/posts\/243\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/media\/1119"}],"wp:attachment":[{"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/media?parent=243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/categories?post=243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.arria.com\/blog\/wp-json\/wp\/v2\/tags?post=243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}