- Schedule, Wednesday, 3 July: Advanced TEI encoding, XPath for searching
- Seminar 6: Enriching TEI Metadata
- Seminar 7: Intro to XPath
- Seminar 8: Special Collections visit
Schedule, Wednesday, 3 July: Advanced TEI encoding, XPath for searching
|9.30||Seminar 6: Enriching TEI metadata [CO]||Presentation|
|10.30||Exercise: Enriching the metadata of the Bunting notebook||Practice|
|11.30||Seminar 7: Intro to XPath [CO]||Presentation|
|12.30||Exercise: XPath searching and calculations||Practice|
|14.00 (Senate House Library)||Seminar 8: Open Discussion on editing and projects with rare books and manuscripts [CO]||Presentation|
|16.00||Library Time||Senate House Library|
Richard Gartner, Metadata for digital libraries: state of the art and future directions. Available from: http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf.
––. Metadata: Shaping Knowledge from Antiquity to the Semantic Web, Springer, 2016.
Michael Kay, XSLT 2.0 and XPath 2.0: A Programmer’s Reference, 4th ed. Wiley, 2008.
Seminar 6: Enriching TEI Metadata
- Access the slides here.
If you want to see a massive TEI header (with funders, donors, and all kinds of additional info), check out one of the Mark Twain Project’s files for Mark Twain’s Autobiography, vol. 3 (2015).
Let’s return to the Basil Bunting notebook from Day 1.
Consult the notebook metadata from the Palace Green Library, Durham University, including:
Title: Last working notebook
Dates of creation: [n.d.]
Extent: Ringbound notebook in plastic wallet (text on ff.1-24, 26-27 only; rest of volume blank). Autograph.
Contents: words, phrases, lines for poems, quotations etc. Begins (f.1) ‘ haimasia drystone wall - Odessey [sic]’. ends (f.27) ‘bat wing, owl song,’ Begun c.1970 (information from Professor Peter Quartermain, 1990).
Comment out the
<p>tag within the
<sourceDesc>in the current XML file.
Expand the existing
Let’s have a look at the metadata provided by library, and see if it could go in an
Take the information in this paragraph and expand the
Note how elements are prescribed to appear in a particular order (from greatest level of granularity to more specific). Notice that most elements cannot be repeated (some like
When you’ve finished creating the
<msIdentifier>delete the remains of the first
<p>from the basic source description. What should have is something like this:
<msIdentifier> <country>United Kingdom</country> <region>County Durham</region> <settlement>Durham</settlement> <institution> Durham University </institution> <repository>Palace Green Library</repository> <collection>Basil Bunting Collection </collection> <idno type="folio">ff. 1-24, 26-27</idno> <altIdentifier> <idno>Item no. 18.</idno> </altIdentifier> <msName>Last Working Notebook</msName> </msIdentifier>
<msContents> acts as a place to store structured information about the intellectual contents of a manuscript. It gives a place for a summary of the contents of the manuscript and multiple
<msItem> elements to form something like a table of contents.
<msContents>(your document will not be valid. It should have a red line).
- Create a
<summary>, which acts as a summary for the intellectual content.
Add a sibling
Surround ‘English.’ with a
@mainLangattribute with a value of ‘en’ (the ISO language code for ‘English’)
@refattribute to the
<author>and point to your
<person>for Basil Bunting, or point to a VIAF entry.
- As this
<msItem>is recording information for this particular item we also want to give it a
<title>.Create an empty
<title>element and Manuscript Notebook c. 1970-1985” into it.
<msContents> <summary>This final working notebook by Bunting consists of notes and other fragmentary thoughts about literature &c...</summary> <msItem> <author>Basil Bunting (1900–1985)</author> <textLang mainLang="en">English</textLang> <title>Manuscript Notebook c. 1970–1985.</title> </msItem> </msContents>
The next paragraph happens to have a lot of information about the physical aspects of the manuscript. Let’s turn it into a
- Add a
- Now nest within
<support>, and inside this complete the text from the library catalogue: e.g., “A XX-page notebook in the collection as … recto and verso” (You could wrap the element
<material>around the word ‘paper’, but also you could add a
<supportDesc>with a value of ‘paper’. You could also categorise the object’s form by adding a @form attribute on
<objectDesc>with a value of ‘folio’.)
- After the closing
</supportDesc>tag add a
<layout>to record information about the physical layout. In this case “Written full width as a single column, with approximately [XX] lines per page”
- To the
<layout>element add a
@columnsattribute of ‘1’, and a
- After the closing
@handsattribute with a value of ‘1’.
- Inside the
<handNote>with the remaining text “Written in Basil Bunting’s hand in pen”. (You might want to mark Bunting as a
<persName>with a ref pointing back to the
for Basil Bunting.)
<physDesc> <objectDesc form="folio"> <supportDesc material="paper"> <support>A single folio of <material>paper</material> ff.1-24, 26-27 only; rest of volume blank. Begins (f.1) ' haimasia drystone wall - Odessey [sic]'. ends (f.27) 'bat wing, owl song,' Begun c.1970</support> </supportDesc> <layoutDesc> <layout columns="1" writtenLines="20">Written full width as a single column, with approximately 20 lines per page</layout> </layoutDesc> </objectDesc> <handDesc hands="1"> <handNote>Written in <persName ref="#BB">Basil Bunting</persName>'s hand in pen.</handNote> </handDesc> </physDesc>
Recording a useful
<history> element gives a place to detail the
<acquisition> of the manuscript if available. In this case we have some minimal information about the origin of the manuscript.
- Add a
- Select all the text of “This notebook was written by Basil Bunting in 1985 at …” and surround it with a
- Inside this mark ‘1985’ as an
<origDate>element. This is like the
<date>element, but is specific to recording the origin date of the manuscript being described. Provide a
@whenattribute of ‘1985-01’.
- Similarly mark the place (Hexham, Northumberland, England, UK) as an
@ref="#hexham"to point to the
<place>you made earlier. You could also surround the text with an
<orgName>if you want to indicate that this is an organizational name. As before you could mark Bunting’s name.
<additional> information about your
At the end of your
<msDesc> you can include an
<additional> element which stores other information such as
<adminInfo> (for recording administrative events of the object),
<listBibl> (for listing bibliographic citations about the object), and
<surrogates> (for listing additional representations of the object).
- Change the final paragraph to an
<additional>element with a
<surrogates>inside that containing all the text (i.e., the page images I shared with you on Day 1).
- Modify the URL given to be a
If you get stuck, compare your work to this enhanced file of a Wilfred Owen manuscript letter.
And here is the spoiler file for the Bunting notebook.
Seminar 7: Intro to XPath
Access the slides here.
Exercise: XPath querying and calculating
- Download the Bad Hamlet XML file.
- Find your XPath 2.0 box in the top left of your oXygen client.
- Perform your first query: find all of the
- How many lines are in Hamlet?
- Write the full (i.e. don’t start your expression with //) path expression for finding all first-level
<div>elements in the text.
- Do the same for second-level
- Write an expression that finds all of Rosencrantz’s speeches. How many results do you get? How about Rosencrantz and Guildenstern?
- Find the string length of each of Hamlet’s speeches.
- Calculate the average character count of Hamlet’s speeches. If you need a guide of common kinds of count expressions, see http://dh.obdurodon.org/functions.xhtml
- Perform the same operation as you did for steps 3–5 except find Horatio. Compare the differences between his and Hamlet’s speech content.
- Write an expression that finds each speech element that comes before a Hamlet speech.
- Write an expression that finds all speeches that come before or after a Hamlet speech.
- What does this expression return in the Hamlet file:
count(descendant-or-self::l) gt 2500?
Click here for the answers.
Seminar 8: Special Collections visit
Some intriguing examples
- Thomas Browne’s “unauthorized” Religio Medici (1642).
- Fragment of Byron’s manuscript of Childe Harold’s Pilgrimage, with Mary Shelley’s revisions.
- J. M. Barrie’s revised typescripts (pre-“Definitive Edition”).
- Siegfried Sassoon’s Georgian Parodies. See the published version here.
- Walter de la Mare’s marginalia.
Proceed to Day 4.