The National Archives (UK)
Possible to get a translator to work on the main catalogue (Discovery) for this?
http://discovery.nationalarchives.gov.uk/SearchUI/
An example item would be:
http://discovery.nationalarchives.gov.uk/SearchUI/Details?uri=C9156576
I can see there might be a problem deciding which fields to map to what. Happy to chat about it.
Jo Pugh
http://discovery.nationalarchives.gov.uk/SearchUI/
An example item would be:
http://discovery.nationalarchives.gov.uk/SearchUI/Details?uri=C9156576
I can see there might be a problem deciding which fields to map to what. Happy to chat about it.
Jo Pugh
Reference --> Loc. in Archive
Description --> Title
Date --> Date
Held by --> Archive
Legal status -->Rights
Access conditions --> ??? (maybe add to rights?)
Publication note: --> note (or abstract?)
Won't happen immediately but seems well worthwhile doing.
http://discovery.nationalarchives.gov.uk/DiscoveryAPI/xml/informationasset/C3454320
compared to:
http://discovery.nationalarchives.gov.uk/SearchUI/Details?uri=C3454320
should work (I think). I can post the XML if it doesn't.
so I can't see that XML, nor can we use it for Zotero, which really is too bad.
If you think the XML provides better guidance for field mapping, you can post it to a public gist at gist.github.com (where it'll be easier to read than here) and link to it.
Any comments on the mapping above?
Let me explain how it works. There are 7 levels to Discovery. Level 1 is a department like INF - the Ministry of Information. Level 6 is where an piece of art like INF 3/108 is sitting and level 7 would be an item level description - like COPY 1/400/23 (which is a photo in a box where the box is COPY 1/400).
In Discovery references must be formed by working up the <ParentIAID> chain and gluing together the <Reference>s.
The tricky thing is that we don't want all of them. We are only interested in levels 1 ("INF"), 3 ("3") and 6 ("108"). So although we need to follow the chain of <ParentIAID>s upwards, we need to throw away levels 2, 4 and 5. These tell us archivally useful things about the object but they are not part of forming the reference and if we include their <Reference>s, the value we generate will be a nonsense and generate an error when we come to look it up.
All clear as mud so far?
If we follow this rule there may be some issues since this is not completely uniform but there is probably a list of the rulebreakers which could be provided.
Does that make sense?
Jo
But in any case the rulebreakers must be somewhere coded into the Discovery API already else how does the Search UI itself know how to form a proper Reference?
Moving on -
- what about "Rights" - there were some questions about that on twitter
- I see that some items have a title field and that the "Description" field is quite long. Here's my current thinking on this:
If there is a title: Title --> Title, Description --> Abstact
if there isn't First line of Description --> Title, Whole Description --> Abtract
- To get a full mapping of field --> Zotero, I've pasted an XML entry here: http://titanpad.com/6QNLqhVl6l
please add the respective Zotero fields (maybe in bold?) as you see fit.
More recent descriptions I notice have a separate <Creator> tag though...
To really complicate, eg my initial catalyst for wishing Discovery could talk to zotero was wanting to cite a specific dated instance of this http://discovery.nationalarchives.gov.uk/SearchUI/Details?uri=C18648 from the UK Government web archive...
"are you saying the Discovery Reference [...] is nearly always formed from a combination of Levels 1 + (optionally, depending on what hierarchical level you wish to refer to) 3 + 6 + 7? "
Yes that is correct. I suggested to mentionthewar that there might be exceptions, but now that I think about it, I don't think there are. There are (legacy?) divisions called "subclasses" where at either level 3 or 6 ("series" and "piece") there are Reference fields which contain a forward slash character, but if you treat them as strings and concatenate them, they should still work. The problem really occurs in the other direction where you want to convert a Citable Reference into an IAID. The fact that the slashes are not necessarily delimiting references at different catalogue hierarchy levels makes this task virtually impossible to do reliably.
For example, "PRO 30/89/40" looks like department "PRO", series "PRO 30", piece "PRO 30/89", item "PRO 30/89/40". It is actually series "PRO 30/89", and when you look up that IA in Discovery you will see that the Reference field says "30/89". Painful when the "/" is usually used to delimit the different hierarchy levels.
I don't think there is a simple rule that allows you to work out which bits of the Citable Reference are the series, piece and item references. The number of slashes can vary even within a piece :(
Author would only rarely be filled - after all, authors are cited so it can't have anyone that isn't the actual creator of the content, but where we have:
<CreatorName>
<CreatorName>
<Corporate_Body_Name_Text>The National Archives</Corporate_Body_Name_Text>
<Corporate_Body_Date_Start>2003</Corporate_Body_Date_Start>
<Corporate_Body_Date_End>0</Corporate_Body_Date_End>
<Birth_Date>0</Birth_Date>
<Death_Date>0</Death_Date>
</CreatorName>
</CreatorName>
we'll use that.
Is the XML format documented somewhere? I'd like to see how exactly the creator entries work.
Rights - I notice this is left blank in just about everything I have in Zotero unless it's explicitly CC. So let's leave (that can of worms and just leave) it blank here too.
Title - When things don't have a title, the reference should be the title, I reckon.
An example of what citation output is supposed to look like can be found at www.nationalarchives.gov.uk/records/citing-documents.htm
There you can see that citations are either supposed to look like:
The National Archives (TNA): Public Record Office (PRO) INF 3/140
OR:
The National Archives (TNA): Public Record Office (PRO) C 139 Chancery: Inquisitions Post Mortem, Series I, Henry VI
(But the latter is a citation of a scoundrel, since it refers to 181 different files.)
But let me have a go with the Titanpad
Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.
This should work for search results and item displays. Let me know how it works. Requests for additional/changed data are welcome, as are requests for other multiple views that aren't yet supported.
Can anyone confirm that they've tightened access to the API, or am I just missing something?
http://labs.nationalarchives.gov.uk/wordpress/index.php/2011/09/the-national-archives-api/
but that post is older than the working translator, so I have no idea. If someone with contact at the archive can find out if there's still a way to get at the XML data, that'd be great.
It's obviously possible to scrape from the page, but that seems like such a waste (and would required completely re-writing that).
I've tweeted them, but if anyone has any contacts that'd be very helpful to find out.
I've uploaded the code to: https://gist.github.com/rt-bell/95723931b04144db3633
If anyone uses it and has any problems, let me know.
Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences. If you're using Standalone, restart Zotero and your browser after updating.
This is - as they note - a makeshift translator and I just did a very cursory review in the hope that we'll get the real deal via API back asap. If the API situation hasn't changed in a couple of month we can revisit this.
Sorry about the problems caused by the temporary loss of the API. It is back online now. Would you be able to test the API-based Parser and switch back to it if everything works okay?
We have identified an issue with the API being unable to return data for non-National Archives records (those records that have recently been imported into Discovery from Access2Archives, the National Register of Archives and Archon), but those records have a different format of Discovery URL anyway and aren't offered for citation by the Zotero browser plugin. We've noted this API issue and will add it to the backlog.
Incidentally, I notice that there was talk of using the Discovery "Title" field (where present) for the Zotero "title" mapping, but that the current parser always uses a standard text of "The National Archive of the UK, COPY 1/2". Is this intentional?
Steven
Thanks again everyone for reporting to us and the Nat'l Archive and for the folks at the archive for fixing it.
Thanks for updating so quickly.
I've had an email from TNA adding a little detail to milh0use's comment above re. the title field:
"most records don't have a title but instead for display and indexing purposes we create one from the first 100 characters of the description. If there are more than 100 we end after the next whole word with ellipses."
The email from TNA also mentioned that whatever arrangements Zotero makes for the title field should reflect the fact that Discovery now includes catalogues from numerous other UK archives (although it seems that non-TNA records are not currently being scraped successfully by Zotero). So "The National Archives of the UK" should not be hard-coded into the title field and instead if the archive name is going to be in the title then that should be taken from the relevant field.
TNA gave an email address at ResourceDiscoveryDevelopment at nationalarchives.gsi.gov.uk for more detailed discussions of the structure of the data.