Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Day of DH: A Textual Analysis

What is digital humanities? There is no single agreed-upon answer amongst practitioners or members of the field. Many note that it is less a unified discipline than a series of methods and practices that share common values (Spiro, 2012; Burdick, 2012; Presner, 2009). One common value amongst the digital humanities crowd is openness and contributions from many, much as in the spirit of the internet (Spiro, 2012). In that spirit of crowdsourcing and ground-up contributions, the Day of DH, an online project, “examines the state of the digital humanities through the lens of those within it” (Day of DH). Participants come together one day a year to individually document their DH activities for the day, both via social media sites such as Twitter and on the Day of DH website. Each year, participants are also asked to answer the question “How do you define digital humanities?” and asked to respond in an open-answer format. Using this data set of user-provided definitions of DH, this paper seeks to identify the methods, practices, and values that converge to form digital humanities, as well as to discover any evolutions in that composition over the five-year span of the data.

Methods

Data collected by Day of DH moderators in response to “How do you define digital humanities?” is provided for free at whatisdigitalhumanities.com. This data set was downloaded in .csv format. First, the entire dataset was run through Voyant, the online text analysis tool, in order to analyze word frequency for the entire corpus. Due to the fact that later analysis would be run on each year individually, and each year’s document was drastically different in length, all measurements of word frequency were taken using relative frequency (i.e. the frequency of occurrence out of a scaled-up corpus of 1M words) as opposed to raw count (i.e. the actual frequency of occurrence), and rounded to the nearest whole numbers. Common English language stop words were used; the terms “digital,” “humanities,” and “dh” were also removed from the dataset, as they were outliers in terms of frequency and also deemed to be unrevealing in terms of textual insight, given that they were most often used in the context of referring to that which respondents were asked to define. Generic works such as “use,” “work,” and “way” were also removed. In order to be more encompassing and to more accurately reflect the use of these words in different forms across the document, terms were converted to strings as applicable. For instance, technology and technologies were merged into the string technolog*, which then included such terms as technological and technologically. Similarly, research became research* to include researching and researchers; methods became method* to include methodological and methods; computing became comput* to include computers, computational, etc.

The top twenty words were then coded as describing either Method, Practice, or Value. The categorizations are outlined in Table 1 in the Findings section, below.

Given that the dataset specified the year of collection for each response, data were then separated into individual documents by year in order to run individual analysis on each time period. Each year’s collection of definitions was analyzed, again using Voyant, using the same stop words as with the main dataset. Likewise, words in different formats and tenses were combined using appropriate strings. Word frequencies were taken for each word in each year and graphed to show the evolution of that term across the entire corpus.

Findings and Discussion

Table 1: Category, Term, and Relative Frequencies of Top 20 Terms in Corpus

Category Term Relative Frequency (out of 1M)
Methods Technolog* 11416
Comput* 9350
Tools 7792
Method* 7114
Questions 3794
Media 3117
Information 2033
Data 1931
Practices Research* 8876
Study|Studies 5895
Scholarship 3151
Field 4099
Discipline(s) 3455
Teach* 2337
Application 2066
Values Cultur* 13492
New 8029
Human|humanistic 4980
Traditional 3442
Knowledge 2168

 

Of the top twenty most frequently occurring terms in the corpus (see Table 1), nearly half (8 out of 20) described Methods within the field, defined here as anything that is used in the pursuit of digital humanities. Within this category, the highest ranking strings, technolog* (relative frequency of 11,416 per million words), followed by comput* (relative frequency of 9,350), speak to the highly digital and technological nature of the discipline. Tools (7,792) and method* (7,114), rather general terms that can easily describe a wide range of offerings, follow as the 3rd and 4th highest ranking in Methods and suggest the breadth of ways that scholars might engage within digital humanities.

The category of Practices, here defined as a process that makes use of the methods from the previous section, accounts for almost as many high frequency terms (7 out of 20) as do Methods. Its top term (research*, relative frequency = 8876) is an outlier to the rest of the set and anchors digital humanities in the scholarly and structured practice of research. Interesting to note is that the term field (relative frequency = 4,099) is more frequently occurring than discipline(s) (relative frequency = 3,455), suggesting that digital humanities is, as Spiro (2012) suggests, viewed as a convergence of various pursuits rather than a traditionally structured discipline. It should also be noted that discipline(s) often occurred in the context of “intersect[ing] disciplines,” or “…paving the way for dialogues to occur across disciplines,” suggesting that a portion of these frequencies do not link digital humanities to being a discipline. Some even claim the negation of it being a disciplines, saying, “I continue to regard digital humanities as a set of methods… as opposed to a full-blown discipline,” and “…digital humanities is not a discipline, per se, but a set of beliefs, theories, practices, methods, and artifacts…”.

The relatively low occurrence of terms describing Values within the top twenty terms of the corpus is of particular interest given the focus that values receive in Spiro’s (2012) influential writings on the topic, and the rather emotionally charged declaration of a “call to action” by Presner (2009). The values that Spiro includes in her manifesto for digital humanities, for example—openness, collaboration, collegiality and connectedness, diversity, and experimentation—are not explicitly represented in the highest ranking terms of this corpus.

Rather than refuting Spiro’s top values, however, the project from which this corpus evolved and the very nature of the data it holds (800+ unfiltered definitions, experimental and diverse in nature), the authorship (collaborative and voluntary), accessibility and connectedness (free on the web), personify these values in a living way rather than explicitly making them visible through textual analysis. These values are embodied and felt, rather than plainly seen.

Also of interest in the realm of values is the presence of the term traditional with a relative frequency of 3,422 per million words. This is perhaps surprising given that much of digital humanities seems to break with tradition. However, in analyzing context of the term, it was found to serve two purposes – first, to set digital humanities apart from traditional scholarship, as in: “the creation and sharing of scholarship…in ways not possible in the traditional humanities,” but also to anchor DH in a traditional history while adding a new and technological layer to these forms of scholarship, such as: “capturing the passion for the traditional path and emboldens it…” or the “application of digital tools to traditional humanities research.” The duality of this term, both setting DH apart from a history of traditional scholarship but also embedding within it and standing on it shoulders, suggests a complexity and dynamism to DH.

Sheet 1

var divElement = document.getElementById(‘viz1472594335837’); var vizElement = divElement.getElementsByTagName(‘object’)[0]; vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’; var scriptElement = document.createElement(‘script’); scriptElement.src = ‘https://public.tableau.com/javascripts/api/viz_v1.js’; vizElement.parentNode.insertBefore(scriptElement, vizElement);
In analyzing the data by year in order to examine any evolution or trend in the ways that practitioners are defining digital humanities, peaks representing the year during which a given term was most frequently mentioned, or valleys during which its mention sharply dropped, may provide insight into contemporary projects or circumstances. While further analysis would be necessary in order to determine any influential factors, it is interesting to note some of the trends represented in Figure 1 below.

While highest ranking strings like comput* began as the highest ranking Method in the document of definitions from 2009, it dropped sharply towards 2012 and only ended up somewhere in the middle of the pack in 2014. Similarly, technolog*, while still ranking highest in Methods from 2011-2014, shows a decline in frequency from 2011 on. These trends might be explained by these methods becoming ingrained within the field and almost implicit within its definition. True to its status as an outlier within the Practices category of the overall corpus, the trend line for research floats above the other strings in its category for the span of the data. While Values may account for a relatively smaller portion of high-frequency words in the overall corpus, the trend lines for terms in this category, especially new and human|humanistic, suggest that perhaps these discussions are on the rise.

Conclusion

Although there may not be widespread consensus as to a working definition of digital humanities, the visions discussed in relevant literature (Spiro, 2012; Burdick, 2012; Presner, 2009), namely that digital humanities is the convergence of methodologies and practices that align under certain values, is born out in the textual analysis of 800+ crowdsourced definitions of DH.

Dominant terms and strings in the overall corpus (cultur*, technolog*, comput*, Research*, New, Tools, Methods) reiterate DH’s diversity in projects, commitment to technology’s ability to mine traditional fields for innovative research, as well as its base in traditional and structured scholarship. Given the nature of the Day of DH project, which seeks to expose quotidien activities of digital humanists, these dominant terms understandably focus on activities and practices rather than values, but they do illustrate the breadth of action within the field.

In examining the corpus by year, we are able to see a movement in technological terms from extremely dominant to more ingrained and implicit within discussions of DH practices, while practices such as Research, and values such as New, remain outliers within their respective categories throughout every year. These demonstrate certain commitments across the diverse projects and foci of people participating in the field.

In looking at the corpus as a whole, as well as each year’s document individually, dominant methods and practices have emerged that prove both DH’s relationship to and place in traditional scholarship as well as its new offerings and complexities.

References

Burdick, Anne, et al. (2012). “Digital Humanities Fundamentals” in Digital_Humanities, pp. 122–23

Presner, Todd, et. Al (2009) “Digital Humanities Manifesto 2.0” http://www.toddpresner.com/?p=7

Spiro, Lisa (2012). “’This is Why We Fight’: Defining the Values of the Digital Humanities” in Debates in the Digital Humanities.

The following two tabs change content below.