Category: Plebeian Lives

Names and people in early modern sources (I)

In my working capacity as the Oracle of the OBP Online, I was recently asked a question that went something like this (details changed):

I’m confused by all these results. If Robert Scott was hanged in 1765, who are all these other Robert Scotts? And some of them are after 1765?!

This is at first glance a slightly daft question - well, obviously, they’re all different people but with the same name, aren’t they? (The question also contains a common misconception about the source, which I’ll come back to in a moment.) And yet, at the same time, it’s not really silly at all.

They might not all be different people. In our database of the names in the OBP there are 142 instances of the name ‘Robert Scott’ (including slight spelling variations). (Mind you, this is nothing compared to a name like John Smith, which occurs more than 4000 times.) How do you decide whether one Robert Scott is the same person as another Robert Scott, or someone else altogether?

And this is without even starting on the problem that a significant proportion of those appearing at the Old Bailey were known by more than one name, and some had a string of aliases and nicknames. Oh, and the reporters (or printers…) sometimes got people’s names - even those of defendants - just plain wrong.

In other words, identifying the relationship between names and people in early modern sources is often extremely tricky, and the question ‘who the hell are all these Robert Scotts?’ isn’t so daft. Which is just as well, really, because this is precisely the kind of problem that’ll be keeping me in work for the next couple of years.

This isn’t just of concern to family historians trying to work out whether someone is really their ancestor or not. Most historians have to make these linkages, ask these questions, at some time or another in the course of their research. Most of us do it on a small scale by hand; a more select group do it on the large scale with computers and algorithms. I’ll hopefully post about both of these later. But in both cases, the process relies on weighing up and ranking probabilities.

Sometimes the answer, either way, is so obvious that the question doesn’t even need to be consciously formed. But at the other end of the scale, there are times when it’s impossible ever to know because you simply don’t have enough information, especially if a name is very common and you have very little contextual information besides the name itself. And I’m sure other historians will have encountered those frustrating borderline cases: if those documents are all referring to the same person, you have a great story. But are you certain enough to rest a serious argument on that identification?

It’s true, for example, that death is a clincher: if you know this Robert Scott died in 1765, then he can’t be the same person as that Robert Scott mentioned in records as alive and well in 1775. (At the other end of the life-cycle, birth is equally conclusive, of course.)

But are you sure he died?

The OBP doesn’t in fact tell you that Robert was hanged (this is the misconception I mentioned above); like archival records from early modern criminal courts, it normally records only the sentence that was passed. But many people sentenced to death in the 18th century were reprieved or pardoned. Unless you have corroborating evidence that the execution was carried out (this does occasionally appear in OBP), you need to be cautious.

So a Robert Scott in the database after 1765 could be the same guy after all. Told you it was tricky.

(To be continued…)

A few links (because the place just isn’t the same without them):

The linkage of historical records by man and computer (JSTOR subscription required)
A discourse on method, historical knowledge and information technology
Reconstructing historical communities
AHDS guide

(X-posted at The Long Eighteenth.)


Ah yes, the job

Mmm, let me tell you a bit more about what I’m getting up to. I’ve often waxed lyrical about The Old Bailey Proceedings Online (and see the OBP Blog Symposium.). So it’s rather delightfully serendipitous that my new job is as project manager for two new, related London history projects, based in the Humanities Research Institute at the University of Sheffield.

The first and relatively simple task is to complete the OBP job by adding the final run of proceedings from 1834-1913 (under the title of Central Criminal Court proceedings), integrating them into the existing site. In total, this will create a fully searchable major digital primary source for London history, and particularly for the history of non-elite Londoners, running right through from the late 17th century into the early 20th century.

The 18th-century project, Plebeian Lives and the Making of Modern London 1690-1800, is much more difficult and complex. Like many other early modern and 19th-century digital primary sources, the OB/CCC proceedings are printed texts - relatively easy to read and transcribe, and to mark up for digitisation. But the majority of the Plebeian Lives sources will be archival manuscript materials. They will cover a wide range: including legal records such as coroners’ inquests; parish records (eg: pauper letters, vestry minute books); the records of Bridewell and Bethlem hospital; apprenticeship records. There’ll also be printed texts, such as Ordinary’s Accounts.

Like the Old Bailey/Central Criminal Court databases, they’ll all end up online: thousands of documents, full text, fully searchable, freely available to all internet users without any subscription barriers. What’s more, we hope to construct a search engine that will make it possible to simultaneously search a number of related online primary source resources alongside ours, including the OBP, and others at different sites such as British History Online.

This is the goal, at least. (I am terrified, whenever I stop being insanely excited.) Right now, all I have for this is a humungous (1 terabyte) hard drive filled with the first batch of scanned document images (very large, high quality .tif files, which is why they take up so much drive space).

The practical difficulties are not minor. Every phase of the process is lengthy and much of it (to be honest) fairly tedious, for both projects. All those documents and printed texts must first of all be microfilmed, scanned, and ‘rekeyed’ (transcribed): that part of it is outsourced, although we have to produce various documentation to guide the rekeyers (and generally nag and cajole the contractors to give us what we want when we want it). Some of the documents will be much better preserved and/or easier to decipher than others.

Then we have to mark up the transcripts in XML, another dull and painstaking task, which will be undertaken in two ways over the next 2 years or so. Right now and with my, um, ‘help’, the HRI programmers are writing fearfully complicated programs that will do substantial sections of the CCC transcripts automatically; the rest will be done manually by several part-time, home-based workers (some of them are postgrad students) who will start this autumn.

Once that markup is done, the CCC project will be quite straightforward to finish off, since it will be essentially a matter of adding it to the existing OBP database and giving it a few tweaks. But for our 18th-century plebeians, our job will barely have begun.

Firstly, the HRI people have to create a powerful search engine that anyone can use fairly easily and, of course, we have to create a web site to present it. We hope that many people with 18th-century interests, from genealogists to academics, will find their own ways of using the resource. What we want to do with it is to analyse the data in order to “reconstruct how ‘ordinary’ Londoners interacted with various government and charitable institutions in the course of their daily lives”. We’ll be doing large scale quantitative analysis and record linkage (to find out, for example, patterns of relationships between claiming poor relief and ending up as a victim or perpetrator of crime). The technique of nominal record linkage has tended to be applied to small rural populations: the computer made record linkage practical in the first place, now the internet is making possible the extension of its methods to the teeming metropolis. On the other hand, we want to do qualitative analysis: where we can find rich enough information about individuals, we’ll trace their individual experiences and uses of the institutions available to them.

I (eventually) get the fun job of writing biographies to put on the website. My bosses have to sit down and write the serious monograph.

I think I have one of the coolest jobs in the universe right now.

. . .

[Parts of this post have been revised and x-posted at my other new bloghome, The Long Eighteenth Century.]