Two weeks ago, I attended Tim Tangherlini‘s talk, “Challenges for a Humanities Macroscope,” in which he presented a project that traces themes and patterns in Danish folklore tales. Then, last Friday, I attended Markus Dickinson’s workshop on Basic Text Analysis Tools. Thinking of both presentations in tandem illustrates the importance of text analysis tools for macroscopic or “distant” reading, and contextualizing large corpora, as Tangherlini’s project shows.
Text analysis allows for a different kind of reading of large text corpora, and it facilitates development of new research questions and assists answering established ones through the discovery of word or phrase patterns. In other words, text analysis tools allow for one to see the larger picture, indeed view texts macroscopically, and see their “aboutness.” To conduct such a reading, Dickinson introduced two tools, AntConc and UAM. Both have user-friendly interfaces and simple designs, and best of all, they are both free. AntConc seems most useful when you have something you want to look for, but you don’t have a research question. In historical inquiry, we want texts to “speak” to us, and I suspect this tool, in performing a distant reading, will allow texts to do just that but in a wider context. AntConc also allows users to determine word frequency, collocation, distribution (in some ways it reminds of a glorified version of Google’s n-gram viewer for GoogleBooks), thus allowing for both micro and macroscopic readings. Similarly, UAM has the same search capacity. It is primarily an annotation tool and built for those with computational skills, and it seems most useful for those who have established research questions, particularly for linguistics and literature since users can find hierarchical schemes and parse a corpus into various pieces. Ultimately, AntConc seems to have wider applications beyond linguistics.
The librarian who sat next to me, asked me bluntly, “So, what can we actually do with this?” From a historian’s perspective, I am most excited about two possibilities with text analysis, although they are broad. Firstly, the UAM corpus tool has a function for rhetorical structure theory (RST), which allows for discursive analysis. The possibilities are endless for studying changes of discourse over time, culling the frequency and collocation of words and phrases. I can see this useful not just for primary texts, but also for historiographical purposes (e.g. “overtime, have scholars’ ideas changed about this?”). Secondly, along the same lines, those with an interest in mythopoetic discourse (battle myths, for example), can upload corpora into one of the tools in order to trace evolving portrayals of events, people, etc., as well as shifts in language in terms of contextualization (“You shall know a word by the company it keeps”!). The usage of these tools and methodologies seems to suggest text analysis comes natural to computational linguistics. However, the study of language is also the study of culture, which is easily applied to the study of history and the human condition.
If there’s one lesson I can learn from these tools (text analysis and distant reading) in light of digital humanities, is that digital humanities is by nature not just interdisciplinary, but transdisciplinary. Research questions and the products, whatever the form of the resulting scholarship, are enriched by humanists and computer scientists interacting and coming together.
[I also created a blog post on text analysis with other tools and resources for the Reference Department at IUB’s Wells Library. The link will work come next Monday.]