Introduction to Text Analysis

16–26 September 2019

Riga Technical University Riga, Latvia

Module tutors:

Christopher Ohge (Institute of English Studies, University of London) [CO]

Martin Steer (School of Advanced Study, University of London) [MS]

Summary: this course will serve as both a general introduction to working with texts in digital humanities as well as the R programming language.

Outcomes: by the end of the module, students will be able to

Required Software

AntConc (https://www.laurenceanthony.net/software/antconc/).

R (https://cran.r-project.org/mirrors.html).

RStudio Desktop (https://www.rstudio.com/products/rstudio/download/).


Monday 16 Sep

Lecture 1: Welcome; What is Digital Humanities? [CO and MS]

Access the slides.

Lecture 2: Data Management and working with Texts in Digital Humanities [MS]

Access the slides.

Tuesday 17 Sep

Lecture 3: What is Text, What is Text Analysis, and What is Distant Reading? [MS and CO]

Access the slides.

Lecture 4: Computer-assisted interpretation: Hathi Trust bookworm exercise. [MS]

Access the slides.

Wednesday 18 Sep

Lecture 5: Voyant tools [CO]

Access the slides

Lecture 6: Intro to corpus linguistics and analysis with AntConc [CO]

Access the slides

Thursday 19 Sep

Lecture 7: Intro to R, part 1 [CO]. Access the R notebook. See also the html file of the notebook. NOTE: right click on the links and Save Link As, then you will be able to open it in your browser.

Lecture 8: Regular Expressions [MS]. Access the regular expressions slides. See also the regex cheat sheet.

Intro to R, part 2 [CO]. Access the R notebook. See also the html file of the R notebook.


Monday 23 Sep

Review R syntax and conditionals [MS].

Access the R notebook on conditionals.

Review Intro to R, part 2 [CO].

Tuesday 24 Sep

Finish reviewing Intro to R, part 2; Lexical variety stats and visualisations. [CO]

Lecture 9: Stylo package in R for stylometry (distance measurements, Craig’s Zeta, network graph). [CO]

Access the R notebook.

Wednesday 25 Sep

Lecture 10: [CO] Lexical dispersion plot; Put POS tags into use. [CO]

Access the R notebook.

Lecture 11: Tidy text in R: texts into dataframes, gutenbergr package and corpus comparison. [CO]

Access the R notebook.

Thursday 26 Sep

Lecture 12: Tidy text in R: sentiment analysis [CO].

Access the R notebook.

A critique of sentiment analysis [MS]; Course review.

Suggested Readings

Eve, Martin. Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell’s Cloud Atlas (Stanford UP, 2019).

Gries, Stefan. Quantitative Corpus Linguistics with R, 2nd edition (Routledge, 2017).

Jockers, Matthew. Text Analysis with R for Students of Literature (Springer, 2014).

––. Macroanalysis (U of Illinois P, 2013).

Moretti, Franco. Graphs, Maps, Trees (Verso, 2007).

Piper, Andrew. Enumerations (U of Chicago P, 2019).

Rockwell, Geoffrey and Stefan Sinclair. Hermeneutica: Computer-Assisted Interpretation in the Humanities (MIT P, 2016).

Silge, Julia, and David Robinson. Text Mining with R: A Tidy Approach (O’Reilly, 2017).

Underwood, Ted. Distant Horizons (U of Chicago P, 2019).