OCLC FirstSearch WorldCat translator misplacing lines of data [with partial fix]

The OCLC FirstSearch WorldCat translator is having trouble with fields that appear on multiple lines in the retrieved data. Lines of data get misplaced into other fields or the catch-all note attachment.

This was reported to me as books with subtitles losing the primary title. I've also noticed it with author names being appended to the title out of order. I expect it also affects other fields.

An example of the title issue is OCLC accession number 723141626 ("Zotero : a guide for librarians, researchers, and educators /"). Zotero saves the title as "a guide for librarians, researchers, and educators", moving the primary title to the note.

The author issue shows up on OCLC number 50321393 ("Dangerous places : health, safety, and archaeology /"). The authors display in WorldCat as "David A Poirier; Kenneth L Feder". Zotero saves the title as "Dangerous places: health, safety, and archaeology / Feder, Kenneth L.", and only "Poirier, David A." as the author.

The translator uses WorldCat's EndNote export format. The title issue is triggered by an inconsistency on OCLC's part, but that's not the case with the authors and EndNote imports the records correctly, so it is a Zotero issue.

I see a report of the title issue from 2022 (https://forums.zotero.org/discussion/96768/space-colon-combos-in-worldcat-firstsearch-cause-only-subtitles-to-be-imported). The lastUpdated date in the translator I just got on a clean install is 2017-01-01, so it doesn't look like anyone found a fix at that point. Could someone take another look?

I'd hoped to be able to supply a fix myself when I thought it was just affecting titles, but having it affecting additional fields puts it beyond me. But here's what I believe is happening:

The translator uses a regex (lineRegexp.compile("^([\\w() ]+): *(.*)$");) to separate lines into Field and Value based on the colon. Lines that don't match the regex get appended to the latest field. Unrecognized fields get dumped in the note.

For titles, the problem comes when the primary title is on its own line and matches the expected pattern for a Field. OCLC #1250348506 is an example: When I intercept the translator's export URL and download the file, I see it includes

Title:
Mutiny on the Rising Sun :
a tragic tale of slavery, smuggling, and chocolate /

on three separate lines. The first line matches the regex so starts the Title field (creating the title variable with no value). "Mutiny on the rising sun :" matches the regex, but isn't a recognized field, so gets stuck into the catch-all Note. And "a tragic tale of slavery, smuggling, and chocolate /" doesn't match the regex, so it's appended to the empty title.

This only happens when the title appears on its own line, separate from the field label. In my testing, I've only seen this on print books: 1265461412 is an e-book version of the same work, and gets saved correctly by Zotero. Downloading that file shows the full title is all on the same line as the "Title:" label, avoiding the problem with the false match. The same problem (and print vs. e-book variation) occurs with "Chocolate: A Cultural Encyclopedia" (1290246459 vs. 1290244644).

Being on a separate line isn't a problem if the primary title doesn't match the regex: 248512005 ("Wholesome advice against the abuse of hot liquors, ...") gets its full title in Zotero, presumably because the commas mean it doesn't match the \w character set.

I made a potential fix for the title field by adding

if ((title == '') && (lines[i].match(/:$/))) {
title = lines[i]
}

right after the var title = match[2]; (around line 58 in Scaffold's Code view). That's fixed the title value for all records I've tested and I can't find anything that it hurts. But then I started noticing the author issue (which I've since replicated in a fully clean install, so it's not related to my fix) and realized this is a bigger problem beyond my current knowledge to fix.

I'm testing in Zotero 6.0.36, the Zotero Connector (for Firefox) 5.0.124, both newly installed today.
Sign In or Register to comment.