Textual Analysis and Fairy Tales

Fairy tales are an intrinsic part of the Western cultural narrative. As such, their stories are ever evolving, reflecting the mores and historical and sociological context of their time. The fairy tales of today—heavily influenced by Disney’s adaptations—were built on previous generations’ versions, which developed out of previous versions, and so on. They have been retold first through oral storytelling, then committed to text, and now through film and pop culture more generally, with every version varying from the last. In this project I chose to focus on three well-known fairy tales, “Cinderella,” “Sleeping Beauty,” and “Snow White” and analyze the text of different versions of the stories to see what differences or patterns emerge when running the text through Voyant to examine word frequency.


Marina Warner (2014) characterizes the fairy tale as a short narrative that is familiar to the reader/listener as a result of it being passed down or because of its resemblance to some other story in its plot, tropes, characters, or imagery. Though the term “fairy tale” is often interchangeable with “folk tale,” some draw the differentiation between “folk” and “fairy” with the shift from the oral to the literary tradition in the late seventeenth century, which marked a shift in the target audience for the stories; no longer were the stories meant to be performed by a storyteller interacting with his audience, but to be purchased and read. A literate audience in the seventeenth through nineteenth centuries was synonymous with a higher socio-economic class and the stories—including those by Charles Perrault—reflect a shift in focus toward bourgeois and aristocratic characters and ideals (Zipes, 1979). Whether or not this is a valid or oversimplified designation, the fairy tales that survive in written form were composed following a tradition similar to their oral predecessors in using a pre-existing story and adapting it for the  audience. As a result, there are versions of some fairy tales that come from across the world and span multiple centuries. Some knowingly engaged with previous versions, others are so tailored to their audience that only the slightest trace of the root story remains.

Arguably the most famous collection of fairy/folk tales was assembled by Jacob and Wilhelm Grimm to present the most “authentic” versions of the stories (Zipes, 1979) and to preserve the tales passed down in German tradition. They themselves ended up creating yet more iterations of these stories over the course of the seven editions of  their Kinder- und Hausmärchen (Children and Household Tales) published between 1812 and 1857. There are vast differences between the first and seventh—considered for many years to be the definitive—editions as the Brothers Grimm edited themselves to adapt to their changing, growing audience (Zipes, 2015).

A uniting characteristic of all versions of fairy tales is how they are presented. The words themselves simply offer  a narrative matter-of-factly. There is no inner psychology of the characters; the reader is not privy to motivations or anything but the most superficial emotions (Warner, 2014). Any emotional depth and interpretation is left up to the reader/listener of fairy tales and their own personal cultural and experiential context. This, I would argue, makes fairy-tale texts especially appropriate for quantitative textual analysis. The narrative structure is simple and the reader is already supplying the subtext.

Not surprisingly, much of the quantitative analysis done on fairy tales has been in the field of women’s and gender studies. Ruth Bottigheimer (1986) analyzed the act of speaking versus silence in fairy tales, counting the frequency of words related to speech and what patterns they take with regard to characters’ gender or role. Jeana Jorgensen and Scott Weingart (2013), in a project branching out of Jorgensen’s dissertation, used digital quantitative tools to analyze how descriptions of the human body vary depending on age and gender. They put together a hand-coded database of approximately 11,000 nouns and adjectives used to describe human bodies in 233 stories, including who or what was being described (age, gender, etc.). They found that females were described more often than males, and the old were described in physical terms more than the young. One possible conclusion they posit is that the assumed narrative viewpoint is male and not-old. In another article about her work, Jorgensen (2014) further observes that descriptions for women prioritize beauty, morality, blood, hair, and skin, whereas for men the emphasis is placed on violence, transformation, size, and age.

For the purpose of this project, I wanted to take a quantitative approach to fairy tales and examine whether patterns in word frequency emerged when comparing different versions. What kinds of words vary noticeably between versions? Is there a similar pattern for different stories when analyzing parallel versions? Is this a worthy avenue of analysis or would close reading be more effective?


The first task was to select the stories and versions. I spent a great deal of time scrolling through D. L. Ashliman’s Folklore and Mythology Electronic Texts index and the SurLaLune Fairy Tales site. Ashliman’s site lists subjects and story types with a bibliography of versions with links to full text translations, some of which he provided himself. I decided to use three of the best-known fairy tales (and the original Disney Princesses) “Cinderella,” “Sleeping Beauty,” and “Snow White.” I decided to settle on the classic iterations for each. There are a number of versions including some from non-European sources, but ultimately it seemed that using those versions, while they would make for interesting close reading, might have so little in common with the classic versions that it would not lead to any results on a word-to-word level. To keep with consistency, I decided to use the same translations (since those can also vary widely), and used Ashliman’s translation of both the 1812 and 1857 editions of Grimms’ fairy tales for all three stories, along with the translation of Charles Perrault’s 1697 “Cinderella, or the Little Glass Slipper” and “The Sleeping Beauty in the Wood” found in Edward Lang’s Blue Fairy Book.

Once I had text files for each story and version I ran them through Voyant. Though Voyant allows you to run multiple files at once, I found that running them individually helped to give me a better sense of how the versions compare. I exported the word frequency data (both raw and relative) into a spreadsheet for each story. From there I excluded words like “said,” “went,” “let,” and “ones” or other words that did not hold any particular meaning and grouped together words that always occurred together (e.g. “king’s” and “son” is used in Perrault’s “Cinderella” in addition to “prince”). With the different versions arranged side-by-side I was able to start making comparisons.




In looking at the top twenty most frequent words from the three versions of “Cinderella,” a few things become apparent. First, Cinderella is, rightfully at the top of the frequency lists for all three (this will not always be the case). The prince or king’s son makes the list, as does the infamous slipper/shoe. As will be the case with both of the other fairy tales, “beautiful” has a high count. “Ball” is used very frequently in Perrault and considerably less often in the first Grimms’ edition. “Dance” and “festival” might be combined to mean the same thing for the 1857 edition. Interestingly, “sisters” is used frequently in Perrault and the 1812 Grimm, with no mention of a stepmother, whereas the opposite is true in the later edition. What is least surprising, but perhaps most informative are the similarities between the two Brothers Grimm editions in contrast to the earlier Perrault. Perrault’s fairy godmother becomes a tree planted on Cinderella’s mother’s grave that happens to grant wishes. The ashes that give Cinderella her name are only mentioned (as ashes) once in Perrault. Overall, there are quite a few more words having to do with nature in the Grimms’ versions: tree, ashes, pigeons, lentils, hazel, bird, etc. Even expanding the list, there are the pumpkin, mice, rats, and lizards that form Cinderella’s coach and equipage in Perrault’s story, but not nearly as many as in the later versions. One might attribute this to the Grimms’ versions being more of a folk tale, more “authentic.” Another interesting detail is the high frequency of the word “pick” in the 1857 edition. “Pick” and “peck” (number 17 in the 1812 edition) are both from repeated use when pigeons help Cinderella sort lentils/grains from ashes, which reads a bit like a refrain and may be something passed down from the oral tradition. (The birds then go on to peck/pick out an eye from each of the stepsisters in the 1856, but not the 1812, edition.)

Sleeping Beauty

sleeping beauty

“Sleeping Beauty,” or “Little Brier-Rose” in the two Grimms’ versions defies analysis a bit more than the other two princess stories. The Perrault version, “The Sleeping Beauty in the Wood,” should perhaps not be compared with the Grimms’ versions because of its length (as well as its vastly different plot that includes cannibalism and ogres and the princess having a years-long secret marriage to her prince), but it does share a few words in common and may be worth a few observations. For one, the heroine does not have a name. There are other names and copious details in the story, but no name for the princess.  The fact that “queen” is the most frequently used word is interesting, but is likely due to the fact that there are two queens in this version of the story. Even in the 1812 edition, Brier-Rose is only the fourth on the frequency list. Perrault includes fairy/fairies to curse/enchant the princess. The Grimms also initially included this detail before changing it to “wise women” in the 1857 edition. Interestingly, there are a number of details about the castle/palace in all three versions, especially the Grimms’. They describe the physical space and the people who inhabit the castle: attendants, a cook, maid, dogs, castle, roof, courtyard, door, hall, room, kitchen, etc. They frequently mention the hedge that grows to encase the enchanted castle after Brier-Rose falls asleep. All three versions use “beautiful” and “asleep” a number of times, as well as “old,” which emphasizes the passage of time while Brier-Rose/nameless princess is asleep.

Snow White

snow white

Perrault, unfortunately, did not write a version of “Snow White” (though there are shades of the story in his version of “Sleeping Beauty”), but the Grimms’ versions present an interesting case on their own. As is clear from the above chart, the two versions are very similar in word frequency. Snow White, and “snow” and “white,” are the most used, followed, not surprisingly given its importance to the plot, “mirror.” Similarly predictable, “queen,” “dwarfs,”  “seven,” and “beautiful” are high on the usage list, as are “fairest” and “red.” Further down the list are: huntsman, black (as in Snow White’s hair), apple, poisoned, blood, fair, fairer. Though Disney’s 1937 film—which I would argue is still the cultural touchstone for the story despite other retellings—changed a number of details from the Grimms’ version, the words that are deeply associated with that tale still match up with the usage exactly as one would expect. Judging just by the word frequency, one can see that there were fewer changes to this fairy tale over the course of the seven editions than others.


There are certainly observations that can be made by analyzing text through word frequency. On this scale it’s hard to draw conclusions, but with more study and combined with closer reading and/or a larger corpus of texts, it could be a worthwhile exercise when studying texts such as these. I am also hesitant to draw too many conclusions with texts that are not in their original language, no matter how consistent or skilled the translation.

An interesting avenue of study would be to just compare the different editions of Grimms’ fairy tales. The 1812 and 1857 would be the most stark, but tracing the evolution of specific word choices through all seven editions could be illuminating, though would certainly necessitate study in the original German. Another possible line of study would be to test Zipes’s  (1979) differentiation between folk and fairy tale by comparing the language of the folk-focused Grimms’ corpus with the more aristocratic-focused authors like Perrault.

As far as digital humanities being employed in the study of fairy tales on a more publicly accessible, project level, it would be nice to see more. One project that uses computational textual analysis for all of the Grimms’ stories is Jeff Clark’s Grimm’s [sic] Fairy Tale Metrics (2013), which breaks the stories down by word count, lexical diversity, and how many words have to do with a specific gender and a number of categories. The two sites I mentioned above, SurLaLune and Ashliman’s Folklore and Mythology Electronic Texts have both existed since the 1990s. The former has continued to improve and has well-maintained links. The latter is truly a treasure trove of information and texts, but the interface feels outdated and the links to other sites are almost entirely broken. The University of Rochester has created The Cinderella Bibliography, which is an extensive annotated bibliography of all things Cinderella and a digitized collection of Cinderella illustrations. They’ve included a page for Beauty and the Beast, but it would be great to see this for more fairy tales. It would also be helpful to incorporate more digitized texts. Though not every translation is in the public domain, there are a plethora of versions and editions of fairy-tale collections that could be organized and incorporated into a digital humanities project. With so much cultural, literary, and artistic material surrounding fairy tales, it’s a natural fit for future projects.



