Day 5
- Schedule, Friday, 5 July: Visualisation and Text Analysis of Edition Data
- Aims
- Readings
- Seminar 12
- Seminar 13
Today will survey some visualisation tools for texts. It will also provide a general introduction to the R programming language, with libraries focusing on text analysis and visualisation.
Schedule, Friday, 5 July: Visualisation and Text Analysis of Edition Data
Time | Topic | Type |
---|---|---|
9.30 | Seminar 12: Current web-based vis tools; Intro to R | Presentation |
11.30 | Seminar 13: Using R to visualise text data; course wrap-up | Presentation |
Aims
- Understand some of the options for visualising text material.
- Understand the value of using programming languages as part of edition planning.
- Working knowledge of the basic syntax of the R programming language, and the ability to modify existing text analysis code.
Readings
Taylor Arnold and Lauren Tilton, Basic Text Programming in R, in the Programming Historian (https://programminghistorian.org).
Stefan Gries, Quantitative Corpus Linguistics with R. Routledge, 2016 (2nd ed.).
Matthew Jockers, Text Analysis with R for Students of Literature. Springer, 2014. [Especially Chapter 8, on XML processing]
Seminar 12
Current web- and app-based vis tools
Voyant Tools
- Go to https://voyant-tools.org/.
- Start by uploading the Bad Hamlet file (click on the “Upload” icon).
- What do you notice about the results?
- Now try another file with some different encoding: Chapters 20-21 of Billy Budd, a heavily revised part of the manuscript.
Click here for the text conversion xslt.
For more information on Voyant, check out Miriam Posner’s Voyant tutorial here.
AntConc
AntConc is a corpus linguistics tool that can be downloaded for free on your machine.
This tool is very good for providing raw word frequencies on multiple files, as well as supplying phrase-level searching and parts-of-speech tagging.
We won’t have time to investigate this tool, but if you would like to learn more, the Programming Historian has an excellent online tutorial on AntConc by Heather Froehlich.
Introduction to R
Click here to download the Intro to R Notebook for this session.
And click here to access an HTML version of the Notebook.
Seminar 13
Using R to visualise text data
For the rest of this session, we will be using the second R Notebook on using tidy text and the XML libraries to visualise edition data, which can be downloaded here.
Click here to access the HTML version of the notebook.
You probably noticed the introduction of regular expressions (regex) the past two days. Here is the link to Regex101.com, the online regex tester that I showed.
(Note: Access the marginalia XML here.)