Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Basics of Statistical Literacy: an Event Review (NYCC DH Week)

This past Thursday, February 11, I had the opportunity to attend one of the NYC Digital Humanities Week Spring 2016 events. Dr. Irene Lopatovska offered an introductory lesson on “Understanding Numbers: Basics of Statistical Literacy” at Pratt’s Manhattan campus.

Dr. Lopatovska began by pointing out that statistical analysis in reporting lends a certain credibility to research. Whether or not this is fair is a separate issue, but it does tend to be the case. So, while perhaps all of the persons in attendance were students or researchers already invested in understanding statistics to this end, the session was open to anyone and would have been useful to those just interested in being better readers of statistical analysis. Statistics are used prolifically across all media and this session offered a sort of structure to an analysis that allows a critique of or decision based on statistical reporting much more accessible. All that being said, having a specific research problem to keep in my mind throughout the presentation did offer helpful, practical context when some of my – what Dr. Lopatovska eloquently referred to as – ‘cognitive efforts’ were getting lost in along the way.

Having some research question in mind was in fact the launching point for Dr. Lopatovska’s statistical process outline as well. Outlining the problem allows you to understand the data that you. This may seem like an obvious idea but by first making this part clear, we can create a more reasonable, realistic time line for a project. It might turn out that the data you need is already available, or it will need to be collected adding extra steps to your research. It’s also important to realize that some research questions lie outside the realm of quantitative data and a more appropriate approach would be to use a different methodology altogether rather than trying to force statistics.

If you find that your research does indeed require this type of analysis, outlining your problem statement also is the necessary first step toward selecting the most appropriate data. To understand how select the best data, Dr. Lopatovska first guided us through the types of numeric data. Here again, understanding the problem statement is crucial to really building the most effective analysis since different types of numbers will prescribe or limit the types of analysis a researcher can perform. I was thinking of understanding the types of numeric data a researcher is using is similar to the prep work for a cooking show where all the ingredients are already chopped up in small bowls and ready to just be added into the pot. Dr. Lopatovska’s outline of numeric data types proved an especially necessary overview for those of us who might not have taken a Math class in a long while. With a group of librarians and humanists, I don’t think I was completely alone in appreciating the introduction to number types. These were introduced in a sort of sliding scale way, beginning with the most limiting number type – nominal (or categorical) numbers.

Numbers that fall into the nominal category are those that act as simply representations. For instance, if in your research survey 0=Male and 1=Female, these numbers don’t have any true numeric worth. As a result, this data type is the most limiting in terms of the kinds of analyses that can be performed.

The second type, ordinal numeric data scales, allow for a wider range of statistical analyses.  These are numbers that have a logical order that is mutually exclusive, but without inherent equal distances between two values. An example of this would be letter grades or education levels. This type was followed by interval numeric scales, where each value has a specific order reflecting equal difference between occurrences (e.g. temperature). Then finally, the ratio scale was introduced which adds a “true” zero to the definition of interval scales (e.g. speed measurements). Dr. Lopatovska suggested that this last category is more often found in scientific studies, but acting as a distinction is still essential to grasp for humanists.  At the end of this annotated list of numeric data types, our group was treated to a casual quiz with examples to put our new found understanding to the test before moving on to what these data types allow in terms of analyses.

Descriptive analyses seems to be what I see most often in the Humanities realm. This type of analysis describes your data in terms of frequency, measures of central tendency, and dispersion. Measures of central tendency would be what I imagine most of us think of as averages. Reaching back to my high school math classes though, it came back to me as we walked through instances of mean, median, and modes. Dr. Lopatovska descried this type of analyses as a “prerequisite to visualization”. Dispersion then, acts as the flip side to central tendency is a way. This measure describes the spread of results, like with range, minimum/maximum values, or standard deviation. We learned that Excel offers a useful, easy to apply formula for this where you can simply key in:

=stdev(array).

Comparative data combines two groups to show difference or similarity. Excel has another formula here that comes in extremely handy with t-testing.

=ttest (array1, array2, tails, type)

This type of t-test analysis will give a P value that can be judged more or less strictly bearing in mind how exploratory or crucial your problem statement’s answer requirements are.  The p-value is used for testing the reliability of a statistical analysis’ results. I wouldn’t do it justice in repeating, but if you look up the story about T-Testing’s history with Guinness Beer, it’s a good story and a good mnemonic device.

Excel was the favored program of this session, and I’m always surprised by how many people have Macs and how limiting that version of spreadsheet software is. Dr. lopatovska also mentioned a few other useful tools, specifically recommending SPSS (Statistical Package for the Social Sciences) which she uses in her class at Pratt Institute, “Data Analysis and Publication”. R was also mentioned. It is free, but there seems to be a lot of statistical knowledge necessary to operate it smoothly. Overall, Dr. Lopatovska stuck with Excel, describing it as “your friend” and the formulas covered at the end are clearly useful as well as easy to apply in that program.

The following two tabs change content below.