[Translators:] "Library Catalog (PICA)": Four issues

nickbart · September 21, 2016

(1) GBV "Extent" field containing "[...]": E.g., in http://gso.gbv.de/DB=2.1/PPNSET?PPN=120024551, "Extent: 206, [32] S." becomes "32" in Zotero’s "# of Pages" field. Expected: "206, [32]".

(2) In many entries, the GBV "Series" field contains an ISSN, or a ZDB-ID, or both:

http://gso.gbv.de/DB=2.1/PPNSET?PPN=615173691:
"Fabian ideas / Fabian Society ; ZDB-ID: 26228610 ; 598"

http://gso.gbv.de/DB=2.1/PPNSET?PPN=789348721:
"International geophysics series , ISSN 0074-6142 ; ZDB-ID: 4109442 ; 104"

Could ISSN and ZDB-ID be removed from Zotero’s "Series" field, and added to the "Extra" field, in a format that could be parsed further, if necessary, i.e., on separate lines, with a colon after ISSN or ZDB-ID?

(For ISSN, being a CSL variable, this certainly would make sense. As to ZDB-ID, I am less sure; moving it to a note, or discarding it altogether would be fine with me as well.)

Note that there’s sometimes additional information in the GBV "Series" field, e.g., place, publisher, date the series was started, ...:

http://gso.gbv.de/DB=2.1/PPNSET?PPN=687564441:
"Il pensiero politico [Il @pensiero politico / Biblioteca] . - Firenze : Olschki, 1969-, ISSN 1122-0767 ; ZDB-ID: 1345588 ; 33"

This, too, does not really belong in Zotero’s "Series" field and could be moved to a note.

(3) "Jr" does not seem to be translated to Zotero at all. Example:

http://gso.gbv.de/DB=2.1/PPNSET?PPN=789348721:
"Robert A. Houze, Jr" becomes "Houze, Robert A."

(4) Finally, many GBV entries for books link to a pdf containing the book’s table of contents. (Example: http://gso.gbv.de/DB=2.1/PPNSET?PPN=789348721.) It would be very useful if this pdf could be automatically downloaded and attached to the Zotero item.

adamsmith · September 21, 2016

1. I'm just not sure we're going to be able to get right consistently. The "extent" field also contains other info, such as the number of volume and I don't see a consistent enough pattern, especially across catalogs, to build a regex. Patches welcome though if anyone else is able to wrangle this.
2. That should be doable for ISSN and ZDB-ID, yes; not sure how much we can remove other things, though. Again, regex can only do so much.
3. That one we take from the Author/Verfasser field where it doesn't have the suffix. I'm not going to go down the rabbit hole of comparing the author listed after the title with the one in the author field and guessing which is more correct, especially since the one with the title is in a harder-to-parse format
4. Hmm -- that'd of course be very doable, but aren't people going to be annoyed? I expect PDFs to have full text and while, of course, we can clearly mark them as ToCs, I'm not sure downloading them is desirable behavior. Open to be convinced, though.

nickbart · September 21, 2016

(1) I would have thought just adding "[" and "]" to the list of chars that match a "number of pages" pattern if appearing in front of "S." or "S. :" would do. – BTW, it seems "Extent:" fields can also contain the string "Seiten" instead of "S.", but Zotero does *not* seem to be parsing a string preceding "Seiten" as a number of pages.

(2) (Re)moving ISSN and ZDB-ID would be a huge improvement already.

(3) Fair enough.

(4) I can only encourage you to try it – as you said, clearly labelling those pdfs as "ToC" should be sufficient to avoid any misunderstandings. Personally, I find the ToCs very useful, and they are in fact one of the reasons why I often prefer GBV to other catalogues. Also, I feel removing those pdfs, in case you don’t want them, would take only a minimum of effort, whereas adding them is currently much more time-consuming.

zuphilip · September 21, 2016

We add the TOC pdf for the GBV ISBN search translator, try for example to search for "3866882408" in the identifier lookup inside zotero. Moreover, I remember that in one translator someone came up with a pretty evolved workflow for different formats of pages, but I cannot remember which translator this was...

zuphilip · September 21, 2016

Re: I found the commit again, it is the same translator (Pica) and it can already deal with cases like 3 vol.(XII-XXIII-681, 540, 739 p.) ; 8vo from http://www.sudoc.abes.fr/DB=2.1/SRCH?IKT=12&TRM=024630527