Digital Scholarly Editing, Advanced Methods (2019)

Day 5

Today will survey some visualisation tools for texts. It will also provide a general introduction to the R programming language, with libraries focusing on text analysis and visualisation.

Schedule, Friday, 5 July: Visualisation and Text Analysis of Edition Data

Time Topic Type
9.30 Seminar 12: Current web-based vis tools; Intro to R Presentation
11.30 Seminar 13: Using R to visualise text data; course wrap-up Presentation

Aims

  • Understand some of the options for visualising text material.
  • Understand the value of using programming languages as part of edition planning.
  • Working knowledge of the basic syntax of the R programming language, and the ability to modify existing text analysis code.

Readings

Taylor Arnold and Lauren Tilton, Basic Text Programming in R, in the Programming Historian (https://programminghistorian.org).

Stefan Gries, Quantitative Corpus Linguistics with R. Routledge, 2016 (2nd ed.).

Matthew Jockers, Text Analysis with R for Students of Literature. Springer, 2014. [Especially Chapter 8, on XML processing]

Seminar 12

Current web- and app-based vis tools

Voyant Tools

  1. Go to https://voyant-tools.org/.
  2. Start by uploading the Bad Hamlet file (click on the “Upload” icon). voy
  3. What do you notice about the results?
  4. Now try another file with some different encoding: Chapters 20-21 of Billy Budd, a heavily revised part of the manuscript.

Click here for the text conversion xslt.

For more information on Voyant, check out Miriam Posner’s Voyant tutorial here.

AntConc

AntConc is a corpus linguistics tool that can be downloaded for free on your machine.

ant-conc

This tool is very good for providing raw word frequencies on multiple files, as well as supplying phrase-level searching and parts-of-speech tagging.

We won’t have time to investigate this tool, but if you would like to learn more, the Programming Historian has an excellent online tutorial on AntConc by Heather Froehlich.

Introduction to R

Click here to download the Intro to R Notebook for this session.

And click here to access an HTML version of the Notebook.

Seminar 13

Using R to visualise text data

For the rest of this session, we will be using the second R Notebook on using tidy text and the XML libraries to visualise edition data, which can be downloaded here.

Click here to access the HTML version of the notebook.

You probably noticed the introduction of regular expressions (regex) the past two days. Here is the link to Regex101.com, the online regex tester that I showed.

(Note: Access the marginalia XML here.)

This project is maintained by cmohge1