Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Were Terry Pratchett’s Final Works Affected by Alzheimer’s Disease?: An Analysis into Vocabulary Trends within the Discworld Series, Post Diagnosis


Regardless of type or style, writers cannot help but put themselves on the page, and so none of their biographies would seem complete without drawing attention to their subjects’ craft directly. By the extension of this thought, what could we learn about ourselves if we took all of what we have written down and stored both online and offline and ran textual analysis? When did our vocabulary expand? When did we start using certain words? How did our style change over time? Would these data help construct the ongoing story of our lives? What else might lay hidden in our thoughts embodied on the page?

Researchers have begun to explore these questions. A longitudinal epidemiological study of nuns from the School Sisters of Notre Dame religious congregation sought to determine if linguistic ability early in life correlates with the risk of developing Alzheimer’s Disease as life progresses. Because this particular set of nuns had written autobiographies upon their admission to the convents in their youth, the researchers could then test these reports for idea density and grammatical complexity. The results, “low idea density had strong and consistent associations with the risk of dementia and premature death (Snowdon, Greiner & Markesbery, 1996, p. 35).”

Now, we should not take from this that those of us who disperse our ideas more gradually throughout our papers will later exhibit the characteristics of Alzheimer’s disease; the studied subset is too small to support universal conclusions. Nevertheless, the suggested idea that the effects of such a profound disease stretch across a lifetime, long before any immediately apparent symptoms, and that this can be exposed through a careful analysis of objectifiable evidence, should inspire further inquiry into the relationship between our minds and what we create.


Three Anglophone writers had continued to practice while living with, or have been rumored to be living with, at least the early stages of Alzheimer’s related dementia. Both Iris Murdoch and Terry Pratchett had either positive medical diagnoses of the disease during their lifetimes or confirmation via autopsy after death (Garrard, Maloney, Hodges, & Patterson, 2005, p 251; Pratchett, 2008). On the other hand, the “senility” that prompted Agatha Christie’s daughter to bar her mother’s publisher from asking for any more work from the senior writer never had an official medical diagnosis as to its cause (Le, Lancashire, Hirst & Jokel, 2011, pp. 370-371). Instead, aware both of the story told by the Christie family’s trusted biographer about the final years of the author’s life, and of the fact that her last works shifted in focus away from her traditional tightly-wrapped murder/thriller narratives to more about remembrance of personal events, researchers adapted previous work explaining how Iris Murdoch’s known dementia arose in her writings to see if these same phenomena appeared within Christie’s novels as well (Lancashire & Hirst, 2009, p.2). If Christie’s body of work demonstrated trends associated with Alzheimer’s Disease similar to Murdoch’s, then this would evidence the notion that they shared a similar experience. Consequently, the results shows this to a great degree.

Agatha Christie

This particular project tested for three conditions: (1) a decrease in vocabulary size, (2) an increase in repeated phases, and (3) an increase in use of indefinite, “thing” words (e.g. “thing,” “things,” “something,”etc.), all of which correlate strongly with Alzheimer’s related dementia (Maxim & Bryan, 1994; Nicholas, Obler, Albert & Helm-Estabrooks, 1985). For their project, “Vocabulary Changes in Agatha Christie’s Mysteries as an Indication of Dementia: A Case Study,” Ian Lancashire and Graeme Hirst studied the first 50,000 words of 16 of Christie’s novels throughout her life from ages 28 to 82. Their findings provided strong evidence for their hypothesis:

  1. The three novels that she wrote in her 80s, Nemesis, Elephants, and Postern, have a smaller vocabulary than any of the analyzed works written by her between ages 28 to 63. Word-types in the first 50,000 words of her novels fall by one-fifth between ages 28–32 and 81–82. Elephants Can Remember, written when she was 81, exhibits a staggering drop in vocabulary, almost 31%, compared with Destination Unknown, written 18 years earlier. Some 15,000 words shorter than Nemesis and Postern of Fate, which preceded and followed it, Elephants appears to register the onset of a profound writing block.

  2. The number of different repeating phrase-types in the first 50,000 words in Christie’s novels increases with age, again implying a decline in the lexical richness of her writing.

  3. Christie’s use of vague, indefinite “thing” words increases significantly with age from 0.27% of her word-count in Styles (1920) to 1.23% in Postern (1973) (pp. 3-4).

To notice this sudden and profound a change of style, examine the graph in the above linked document. At the end of her life, the trends are immediately apparent.

Iris Murdoch

As mentioned, Christie’s researchers were inspired by “The effects of very early Alzheimer’s disease on the characteristics of writing by a renowned author,” a study into how Iris Murdoch’s experience with Alzheimer’s influenced her work. But unlike Christie, Murdoch did not continue to write far into senility. She published her final book, Jackson’s Dilemma, in 1995, and underwent neuropsychological assessment the following year, which exhibited her deficits in attention, orientation, memory, and language skills. Interestingly, critics seemed to recognize traces of these conditions in Jackson’s Dilemma, commenting on its lack of narrative structure and character development, even comparing it to the writing of a 13 year old girl.

The project analyzed three of her works, the entirety of Under the Net and Jackson’s Dilemma, and the first 100 pages of The Sea, The Sea. Therefore, her earliest fiction (Under the Net), one from the height of her career (The Sea, The Sea), and her last had representation. After preparing the texts for study, a number of analyses were run on vocabulary, syntactic difference in complexity and grammatical class, and lexical variation in word length and frequency (pp. 251-254).

The closely interlinked vocabulary and lexical studies showed the clearest results. For vocabulary, the researchers structured their data to demonstrate the rate of which Murdoch introduced new words types in 10,000 word token increments. They found that The Sea, The Sea markedly improved on her previous effort in Under the Net, which was itself greater than vocabulary usage in Jackson’s Dilemma at every level. “These results suggest an enrichment of available vocabulary between the early and middle stages of I.M.’s writing career, followed by a relative impoverishment during the composition of the final work,” which appear consistent with the story of that Murdoch’s vocabulary improved as she aged, then decreased as dementia started to affect her mind (p.255). Also, supporting this trend lexicographically, Jackson’s Dilemma has the highest number of repeated words, by a significant margin.

While this article provides convincing evidence of the forces at play in Murdoch’s writing by the end of her life, a sample size of just part of three books limits the power of its conclusion. Questions arise: when did the decrease in vocabulary across novels start? Certainly, the appearance of a more gradual decrease would change the nature of the conclusion.

Knowing this, “Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists,” published in 2011, intended to revisit Murdoch’s work and more greatly account for her corpus, through use of updated methods. Here, twenty of her twenty six works of fiction support the claim about her vocabulary usage that stands consistent with the previous study. Murdoch’s written vocabulary steadily trends upward, then sharply plunges in Jackson’s Dilemma. Similarly, this final work, when taken in its entirety, demonstrates a relatively great increase in the repetition of words and phrases (Le et al., 2011, pp. 441-443).

This study also considers the aforementioned data on Agatha Christie, and compares both authors to a control, P.D. James, who aged healthy. Therefore, the proposal at the heart of these studies are promoted by clear and consistent evidence. Alzheimer’s related dementia can appear in the works of authors, when considering their work over a lifetime.

Terry Pratchett

Now that we comprehend how profoundly dementia can affect a writer, one might wonder about the recent death of Terry Pratchett and how Alzheimer’s may have influenced his writing. Furthermore, when we start to inspect his case, a number of peculiarities arise in relation to the more typical factors that pervade the previous two instances, which will serve as a compelling juxtaposition to color the research results.

Perhaps most surprisingly, especially when we consider the previous authors’ actions around the time they started to bare the symptoms of dementia, Pratchett, though diagnosed in 2007, penned a number of sizable pieces while living with the disease (Pratchett, 2008; “Bibliography”). Differently still, critics lauded many of these works, with Wintersmith (2006), Nation (2008), I Shall Wear Midnight (2010), and Snuff (2011) all winning awards, while others received high praise. Thus, Terry Pratchett, post-diagnosis, does not appear to carry the image of the infirmed, in the twilight of his life. Contrarily, his publishers spoke of his fierce independence, saying that he “didn’t need much help if any from his editors (Cowdry, 2016).” Active on social media and granting many interviews, his message to the world was clear: “It’s possible to live well with dementia and write bestsellers (bbc., 2015).”

One of main reasons for this was the atypical pattern of emergence of Alzheimer’s Disease in Pratchett’s brain. Posterior Cortical Atrophy (PCA), explained by Pratchett himself:

I have the opposite of a superpower; sometimes I cannot see what is there. I see the teacup with my eyes, but my brain refuses to send me the teacup message. It’s very Zen. First, there is no teacup and then, because I know there is a teacup, the teacup will appear the next time I look (Pratchett, 2010).

Regardless of its Zen-like quality, PCA selectively affects regions in the back of the brain, the parietal, occipital, and occipitotemporal cortices, which cause “progressive decline in visuospatial, visuoperceptual, literacy, and praxic skills (Crutch, Lehmann, Schott, Rabinovici, Rossor & Fox, 2012, p. 170).” These deficits manifest in a various symptoms, which relate to:

higher-order visual problems, such as difficulties with object and space perception, are reported more often than are basic visual impairments, [though] many such problems are at least partly due to deficits in more basic visual processing—eg, form, motion, colour, and point localisation(Crutch et al., 2012).

This visual basis affects literacy and praxis, as such functions depend on visuospatial and visuoperceptual acuity, in Pratchett’s own words:

[… ] what it does do, while gradually robbing you of memory, visual acuity and other things you didn’t know you had until you miss them, is leave you more or less as fluent and coherent as you always have been (Pratchett, 2008).

So, while the typical amnestic Alzheimer’s disease, as exhibited by Iris Murdoch, predominantly shrinks memory and language centers, embodied in the brain by the hippocampus and temporal lobes, Pratchett retained the health of these sections of the brain. Therefore, only considering this, my research into the lexical usage in Pratchett’s work should not have the same tendency as the previous two studies. Nevertheless, Alzheimer’s Disease has a strong progressive bend, and Murdoch’s brain displayed shrinkage generally (Garrard et al., 2005, p. 251). It does not seem implausible that Pratchett’s general faculties were affected as well, albeit at a much slower pace. Will we see a progressive change in vocabulary, perhaps on the scale of the other two authors, but across a much longer timespan? Or does PCA type Alzheimer’s merely affect our visual faculties?

Method and Materials

I tracked Pratchett’s writing over his last eleven novels in his acclaimed Discworld series. Because PCA does not affect vocabulary outright as does amnestic Alzheimer’s Disease, any marked movement in vocabulary use across books would likely appear in the data as a gradual, general trend downward, rather than a steep decline as seen in Jackson’s Dilemma or Christie’s last novels. I present a bibliography for the work I did analyze:

Monstrous Regiment (Doubleday, 2003)
A Hat Full of Sky (Doubleday, 2004)
Going Postal (Doubleday, 2004)
Thud! (Doubleday, 2005)
Wintersmith (Doubleday, 2006)
Making Money (Doubleday, 2007)
Unseen Academicals (Doubleday, 2009)
I Shall Wear Midnight (Doubleday, 2010)
Snuff (Doubleday, 2011)
Raising Steam (Doubleday, 2013)
The Shepherd’s Crown (Doubleday, 2015)

To prepare the material for analysis, each PDF file was subjected to Google Drive/Docs inborn OCR function, which rendered near pristine TXT files of each book. Then, each document was cleaned, counted for words, and cut to a 75,000 length, as The Shepherd’s Crown comes in at a mere 78,927. I could now upload the corpus into Voyant for textual analysis. Each book’s word type count, as well as the number of indeterminant “thing” (thing, things, something, anything, everything, and nothing) words were counted and recorded into Google Sheets. Then, I used this application to form a scatter plot with my findings, and chose to insert a trendline, to visualize the distribution of vocabulary over eleven novels.


From my data, I conclude that PCA did not have a significant effect on Terry Pratchett’s vocabulary usage up the last day of his writing. The criterion of a progressive negative trend did not emerge in the data; rather the opposite occurred, where it seems as though Pratchett got more verbose as time, and his condition, advanced. Generally, by the end of the graph, we can see both his higher word type novels and his lower word type novels get higher, and so, based on the data, there should be no question of the non-existence of the effect so visible in both Iris Murdoch’s and Agatha Christie’s writing. Terry Pratchett’s atypical form of Alzheimer’s disease did not affect his ability to express himself through his choice of words. Also supporting this, his use of vague words were negligible, with the largest ratio of these words to the 75,000 accounted for in any of his novels was .00676. Christie’s highest, her last work, peaked at .0123, with a strong upward rise beforehand.


This study of Terry Pratchett’s work connects to various discussions: the nature of Alzheimer’s Disease, the place of text analysis in biographical construction, the interchange of digital humanities and scientific research, and the relationship between mind and its embodiment in the world.

And while this investigation hangs in the dense web of the above topics, I must also manage expectations by mention of the imperfections in my method that, in some way, hinder its conclusion. I do this as a means to instruct others that might seek to perform a more exhaustive examination, or to create a tool designed with functions that might aid in the process.
Unlike Murdoch’s style, around which the methods I used were designed, Pratchett’s writing does not yield as easily to this form of analysis. The original study outlined the ideal circumstances for an author’s condition to be represented in such a study:

(i) that the task in question is undertaken voluntarily and presumably comes naturally to the subject;

(ii) that the subject is unaware of the incipient disease, which eliminates negative emotional and compensatory strategic effects on performance;

(iii) that the availability of material from periods both before and after disease onset provides the opportunity for within-patient comparison (Garrard, 2005).

Furthermore, the researchers removed all direct quotations from their analysis, in order to mitigate any potential changes in style and diction based on an author’s intention to adapt her writing around the voices present in the speech of the characters.

An analysis of Pratchett would have benefited from the excision of quotations from the body of the text, especially since he tends to represent vernacular within his writing (ex “goin’” instead of “going,” etc.). I did not do this because I could not find any removal tool with this function across thousands of pages. But, also, since Pratchett’s stories are replete with dialogue, to remove so much would mean to strip his novels of an essential facet of his style. The question, then, arises about what we should value more, the integrity of the experiment (to remove dialogue) or the integrity of the text itself (to preserve it). This problem has no quick answers, and should have more theory written about it.

Moreover, Terry Pratchett does not adhere to condition (ii), as he knew about his condition and admits to finding “work arounds.” To avoid this, a more complete study would look at the book written right before his diagnosis, Wintersmith, and compare that to the rest of his corpus beforehand, to see if there is any change. Although, I suspect that experiment would have little success, given the results of this study. Additionally, we have many first hand accounts of how Alzheimer’s affected Pratchett’s work, mainly through his inability to spell and type, not through his forgetting how to assign words to things in the world; while he confesses to the use of a “despised” spell checker, he mentions no thesaurus (Pratchett, 2008; Pratchett, 2010).

Then, if we had already had an account of how Terry Pratchett actually experienced the disease, what was the significance of this experiment? I think Pratchett encapsulates the answer himself well:

[… w]hen the kind lady who periodically checks me out asked me to name as many animals as I can, I started with the rock hyrax, the nearest living relative to the elephant, and thylacine – the probably extinct Tasmanian marsupial wolf (2008).

Collectively, we know little about the causes, manifestations, and processes inherent to Alzheimer’s dementia. The “kind lady” gave Terry Pratchett a similar test to the ones Iris Murdoch failed, yet he succeeded (Garrard, 2005). She, most likely a medical professional, was testing for memory and recall, trying to find out if the PCA has spread outside its typical points of degeneration. I performed the same assessment, in essence. From this, we get the sense that all experience of Alzheimer’s might differ, and thus the stereotypical image of the demented invalid does not apply in all cases. Terry Pratchett beautifully expressed himself deep into his disease, and produced many cogent and profound thoughts. Thus, this should motivate us to reevaluate the social stigma of the general label “Alzheimer’s Sufferer” carries.

Biographically, these three case studies should prove that text analysis has a place in the construction of one’s life story. While a biographer can usually discern from traditional material that her subject has had a certain experience, whether through disease or otherwise, interpretation of how this experience affects the subject’s writing is another task entirely. While close reading under a biographical lense has it use, distance reading can expose subtle trends that do not appear to the eye, such as total vocabulary, grammar, and lexical use over a corpus. This added dimension will make the entire story more lucid, especially if traditional interpretation and data analysis are used to compliment each other.

Here, the underlying synthesis is important to accentuate. Science, the arts and humanities need not exclude each other. Ultimately, they all try to express the core of human experience in one way or another, just under a diverging set of assumptions and methods. This distance of perspective should not entail a distance in association, in all circumstances. Studies like the ones I have explained will reveal the greater constellations of which they all hang together, and my hope is that medical professionals, data scientists, and humanities scholars find some use in them, or see the same use in others like them.

I would be remiss to only mention academics and scholars when discussing the value of this research. As I stated at this articles beginning, these methods of research have implications outside the ivory tower. The binary of the digital represents a contradiction. As we spread ourselves across the web, our thoughts, desires, motivations, actions, are captured outside ourselves, stored and processed by forces beyond our willing comprehension; at the same time, we lose track of that which we hold valuable in the all consuming undertow of new information. Digital citizens are subject both to a permanence and an ephemerality simultaneously, incommensurable with traditional methods of understanding. In response, we must look to new practices to augment those capacities that would otherwise fail to process the torrents of information we both release and ingest. Text analysis holds the promise of discovering value hidden in the totality of data out there, and thus our minds can extend further out into the world than our strictly narrow, natural focus allows. We should invest more effort into the digital principles of understanding, so that not only scholarly research will benefit.



Bibliography. (n.d.). Retrieved May 01, 2016, from

Cowdry, K. (n.d.). New Terry Pratchett publishing revealed at memorial | The Bookseller. Retrieved May 01, 2016, from

Crutch, S. J., Lehmann, M., Schott, J. M., Rabinovici, G. D., Rossor, M. N., & Fox, N. C. (2012). Posterior cortical atrophy. The Lancet Neurology, 11(2), 170-178.

Garrard, P., Maloney, L. M., Hodges, J. R., & Patterson, K. (2005). The effects of very early Alzheimer’s disease on the characteristics of writing by a renowned author. Brain, 128(2), 250-260.

Lancashire, I., & Hirst, G. (2009, March). Vocabulary changes in Agatha Christie’s mysteries as an indication of dementia: a case study. In 19th Annual Rotman Research Institute Conference, Cognitive Aging: Research and Practice (pp. 8-10).

Le, X., Lancashire, I., Hirst, G., & Jokel, R. (2011). Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Literary and Linguistic Computing, fqr013.

Maxim, J. & Bryan, K. (1994). Language of the Elderly: A Clinical Perspective.
London: Whurr.

Nicholas, M., Obler, L.K., Albert, M.L., Helm-Estabrooks, N. (1985). Empty speech in Alzheimer’s disease and fluent aphasia. Journal of Speech and Hearing Research, 28: 405–410.

Pratchett, T. (2008, October 7): I’m slipping away a bit at a time… and all I can do is watch it happen. Retrieved April 29, 2016, from–I-watch-happen.html

Pratchett, T. (2010, February 01). Terry Pratchett: My case for a euthanasia tribunal. Retrieved May 01, 2016, from

Sir Terry Pratchett, renowned fantasy author, dies aged 66. (2015, March 12). Retrieved May 08, 2016, from
Snowdon, D. A., Greiner, L. H., & Markesbery, W. R. (2000). Linguistic ability in early life and the neuropathology of Alzheimer’s disease and cerebrovascular disease: Findings from the Nun Study. Annals of the New York Academy of Sciences, 903(1), 34-38.