Day 3
- Schedule, Wednesday, 3 July: Advanced TEI encoding, XPath for searching
- Readings
- Seminar 6: Enriching TEI Metadata
- Seminar 7: Intro to XPath
- Seminar 8: Special Collections visit
Schedule, Wednesday, 3 July: Advanced TEI encoding, XPath for searching
| Time | Topic | Type |
|---|---|---|
| 9.30 | Seminar 6: Enriching TEI metadata [CO] | Presentation |
| 10.30 | Exercise: Enriching the metadata of the Bunting notebook | Practice |
| 11.30 | Seminar 7: Intro to XPath [CO] | Presentation |
| 12.30 | Exercise: XPath searching and calculations | Practice |
| 14.00 (Senate House Library) | Seminar 8: Open Discussion on editing and projects with rare books and manuscripts [CO] | Presentation |
| 16.00 | Library Time | Senate House Library |
Readings
Richard Gartner, Metadata for digital libraries: state of the art and future directions. Available from: http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf.
––. Metadata: Shaping Knowledge from Antiquity to the Semantic Web, Springer, 2016.
Michael Kay, XSLT 2.0 and XPath 2.0: A Programmer’s Reference, 4th ed. Wiley, 2008.
Seminar 6: Enriching TEI Metadata
- Access the slides here.
If you want to see a massive TEI header (with funders, donors, and all kinds of additional info), check out one of the Mark Twain Project’s files for Mark Twain’s Autobiography, vol. 3 (2015).
Exercise
Let’s return to the Basil Bunting notebook from Day 1.
Consult the notebook metadata from the Palace Green Library, Durham University, including:
Title: Last working notebook
Reference: 18
Dates of creation: [n.d.]
Extent: Ringbound notebook in plastic wallet (text on ff.1-24, 26-27 only; rest of volume blank). Autograph.
Contents: words, phrases, lines for poems, quotations etc. Begins (f.1) ‘ haimasia drystone wall - Odessey [sic]’. ends (f.27) ‘bat wing, owl song,’ Begun c.1970 (information from Professor Peter Quartermain, 1990).
Recording <msIdentifier>
-
Comment out the
<p>tag within the<sourceDesc>in the current XML file. -
Expand the existing
<msIdentifier>. -
Let’s have a look at the metadata provided by library, and see if it could go in an
<msIdentifier>. -
Take the information in this paragraph and expand the
<msIdentifier>. -
Note how elements are prescribed to appear in a particular order (from greatest level of granularity to more specific). Notice that most elements cannot be repeated (some like
<collection>and<altIdentifier>can be). -
When you’ve finished creating the
<msIdentifier>delete the remains of the first<p>from the basic source description. What should have is something like this:
<msIdentifier>
<country>United Kingdom</country>
<region>County Durham</region>
<settlement>Durham</settlement>
<institution> Durham University </institution>
<repository>Palace Green Library</repository>
<collection>Basil Bunting Collection </collection>
<idno type="folio">ff. 1-24, 26-27</idno>
<altIdentifier>
<idno>Item no. 18.</idno>
</altIdentifier>
<msName>Last Working Notebook</msName>
</msIdentifier>
Recording <msContents>
The <msContents> acts as a place to store structured information about the intellectual contents of a manuscript. It gives a place for a summary of the contents of the manuscript and multiple <msItem> elements to form something like a table of contents.
-
Add an
<msContents>(your document will not be valid. It should have a red line). - Create a
<summary>, which acts as a summary for the intellectual content. -
Add an
<msItem>element. -
Add a sibling
<author>element. -
Surround ‘English.’ with a
<textLang>element. -
Add an
@mainLangattribute with a value of ‘en’ (the ISO language code for ‘English’) -
Add a
@refattribute to the<author>and point to your<person>for Basil Bunting, or point to a VIAF entry. - As this
<msItem>is recording information for this particular item we also want to give it a<title>.Create an empty<title>element and Manuscript Notebook c. 1970-1985” into it.
Your
<msContents>
<summary>This final working notebook by Bunting consists of notes and other fragmentary thoughts about literature &c...</summary>
<msItem>
<author>Basil Bunting (1900–1985)</author>
<textLang mainLang="en">English</textLang>
<title>Manuscript Notebook c. 1970–1985.</title>
</msItem>
</msContents>
Adding <physDesc>
The next paragraph happens to have a lot of information about the physical aspects of the manuscript. Let’s turn it into a <physDesc>.
- Add a
<physDesc>. - Now nest within
<phyDesc>an<objectDesc>with a<supportDesc>inside that. -
Inside that
<supportDesc>add a<support>, and inside this complete the text from the library catalogue: e.g., “A XX-page notebook in the collection as … recto and verso” (You could wrap the element<material>around the word ‘paper’, but also you could add a@materialattribute to<supportDesc>with a value of ‘paper’. You could also categorise the object’s form by adding a @form attribute on<objectDesc>with a value of ‘folio’.) - After the closing
</supportDesc>tag add a<layoutDesc>with a<layout>to record information about the physical layout. In this case “Written full width as a single column, with approximately [XX] lines per page” - To the
<layout>element add a@columnsattribute of ‘1’, and a@writtenLinesof ‘XX’. - After the closing
</objectDesc>add a<handDesc>with a@handsattribute with a value of ‘1’. - Inside the
<handDesc>add a<handNote>with the remaining text “Written in Basil Bunting’s hand in pen”. (You might want to mark Bunting as a<persName>with a ref pointing back to thefor Basil Bunting.)
<physDesc>
<objectDesc form="folio">
<supportDesc material="paper">
<support>A single folio of <material>paper</material> ff.1-24, 26-27 only; rest of volume blank. Begins (f.1) ' haimasia drystone wall - Odessey [sic]'. ends (f.27) 'bat wing, owl song,' Begun c.1970</support>
</supportDesc>
<layoutDesc>
<layout columns="1" writtenLines="20">Written full width
as a single column, with approximately 20 lines per
page</layout>
</layoutDesc>
</objectDesc>
<handDesc hands="1">
<handNote>Written in <persName ref="#BB">Basil Bunting</persName>'s hand in pen.</handNote>
</handDesc>
</physDesc>
Recording a useful <history>
The <history> element gives a place to detail the <origin>, <provenance>, and <acquisition> of the manuscript if available. In this case we have some minimal information about the origin of the manuscript.
- Add a
<history>element. - Select all the text of “This notebook was written by Basil Bunting in 1985 at …” and surround it with a
<origin>element. - Inside this mark ‘1985’ as an
<origDate>element. This is like the<date>element, but is specific to recording the origin date of the manuscript being described. Provide a@whenattribute of ‘1985-01’. - Similarly mark the place (Hexham, Northumberland, England, UK) as an
<origPlace>with a@ref="#hexham"to point to the<place>you made earlier. You could also surround the text with an<orgName>if you want to indicate that this is an organizational name. As before you could mark Bunting’s name.
Recording <additional> information about your <surrogates>
At the end of your <msDesc> you can include an <additional> element which stores other information such as <adminInfo> (for recording administrative events of the object), <listBibl> (for listing bibliographic citations about the object), and <surrogates> (for listing additional representations of the object).
- Change the final paragraph to an
<additional>element with a<surrogates>inside that containing all the text (i.e., the page images I shared with you on Day 1). - Modify the URL given to be a
<ptr>with a@targetattribute.
If you get stuck, compare your work to this enhanced file of a Wilfred Owen manuscript letter.
And here is the spoiler file for the Bunting notebook.
Seminar 7: Intro to XPath
Access the slides here.
Exercise: XPath querying and calculating
- Download the Bad Hamlet XML file.
- Find your XPath 2.0 box in the top left of your oXygen client.
- Perform your first query: find all of the
<l>elements. - How many lines are in Hamlet?
- Write the full (i.e. don’t start your expression with //) path expression for finding all first-level
<div>elements in the text. - Do the same for second-level
<div>s. - Write an expression that finds all of Rosencrantz’s speeches. How many results do you get? How about Rosencrantz and Guildenstern?
- Find the string length of each of Hamlet’s speeches.
- Calculate the average character count of Hamlet’s speeches. If you need a guide of common kinds of count expressions, see http://dh.obdurodon.org/functions.xhtml
- Perform the same operation as you did for steps 3–5 except find Horatio. Compare the differences between his and Hamlet’s speech content.
- Write an expression that finds each speech element that comes before a Hamlet speech.
- Write an expression that finds all speeches that come before or after a Hamlet speech.
- What does this expression return in the Hamlet file:
count(descendant-or-self::l) gt 2500?
Click here for the answers.
Seminar 8: Special Collections visit
Some intriguing examples
- Thomas Browne’s “unauthorized” Religio Medici (1642).
- Fragment of Byron’s manuscript of Childe Harold’s Pilgrimage, with Mary Shelley’s revisions.
- J. M. Barrie’s revised typescripts (pre-“Definitive Edition”).
- Siegfried Sassoon’s Georgian Parodies. See the published version here.
- Walter de la Mare’s marginalia.
Proceed to Day 4.