Importing MARCXML from Internet Archive

Zorozero · June 5, 2023

Hi,
I'm trying to import Internet Archive MARCXML directly (Import from Clipboard), however it will forward me to https://www.zotero.org/support/kb/importing_standardized_formats with error message "The selected file is not supported".

It works if the XML was 'hard' downloaded (file stored), but not from clipboard. The sample is https://archive.org/download/principlesofnucl0000call/principlesofnucl0000call_archive_marc.xml

It would be also extremely nice if that would work in the "Add item(s) by Identifier" magic ward.

adamsmith · June 5, 2023

I suspect you're not actually getting the xml onto your clipboard. E.g., when you load the above link in a browser, you get something that starts like
04091cam a2200961 a 4500 principlesofnucl0000call CaSfIA
Because the browser view of XML strips out all the tags. That is, indeed, not a valide import format.

If you look at the actual XML (which, e.g., in Firefox you can do with ctrl+u/view source) that same item starts with

?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>04091cam a2200961 a 4500</leader>
  <controlfield tag="001">principlesofnucl0000call</controlfield>

and imports fine via import from clipboard.

Importing metadata and importing identifiers are two completely different things, so the magic wand field is definitely not going to import metadata -- not sure why it should since you have the clipboard option available.

Zorozero · June 6, 2023

So looking at the "actual XML" as page source (ctrl+u), then copying and inserting from clipboard, still gives an error "...please ensure that the file is valid..."

(The metadata/identifiers explanation is clear.)

(For reference: Zotero v6.0.26)


<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>04091cam a2200961 a 4500</leader>
  <controlfield tag="001">principlesofnucl0000call</controlfield>
  <controlfield tag="003">CaSfIA</controlfield>
  <controlfield tag="005">20221214041246.0</controlfield>
  <controlfield tag="006">m     o  d</controlfield>
  <controlfield tag="007">cr||||||||||||</controlfield>
  <controlfield tag="008">910204s1991    enkaf   ob    001 0 eng d</controlfield>
  <datafield tag="010" ind1=" " ind2=" ">
    <subfield code="z">   91008439 </subfield>
  </datafield>
  <datafield tag="035" ind1=" " ind2=" ">
    <subfield code="a">(OCoLC)1194912241</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">NZ1</subfield>
    <subfield code="b">eng</subfield>
    <subfield code="c">NZ1</subfield>
    <subfield code="d">DLC</subfield>
...

adamsmith · June 6, 2023

Other import from clipboard (e.g. RIS, BibTeX) is working? How are you selecting? I'd recommend ctrl+a --> ctrl+c to make sure you're not missing anything -- XML breaks immediately if even one character at the beginning or end are left out. The error you're getting suggests this might be the issue.

I tested this (in Zotero 7 but almost certainly shouldn't matter here) and it imports fine.

If you really can't get this to work, could you please provide a Debug ID for the import attempt?
https://www.zotero.org/support/debug_output

Zorozero · June 8, 2023

Right! Yes, I was using ctrl+a, however double checking this the page-source seems to add one blank-line at front and one at the end. Removing the two lines, the XML is imported correctly.