Importing MARCXML from Internet Archive

edited June 6, 2023
Hi,
I'm trying to import Internet Archive MARCXML directly (Import from Clipboard), however it will forward me to https://www.zotero.org/support/kb/importing_standardized_formats with error message "The selected file is not supported".

It works if the XML was 'hard' downloaded (file stored), but not from clipboard. The sample is https://archive.org/download/principlesofnucl0000call/principlesofnucl0000call_archive_marc.xml

It would be also extremely nice if that would work in the "Add item(s) by Identifier" magic ward.


  • I suspect you're not actually getting the xml onto your clipboard. E.g., when you load the above link in a browser, you get something that starts like
    04091cam a2200961 a 4500 principlesofnucl0000call CaSfIA
    Because the browser view of XML strips out all the tags. That is, indeed, not a valide import format.

    If you look at the actual XML (which, e.g., in Firefox you can do with ctrl+u/view source) that same item starts with
    ?xml version="1.0" encoding="UTF-8"?>
    <record xmlns="http://www.loc.gov/MARC21/slim">
    <leader>04091cam a2200961 a 4500</leader>
    <controlfield tag="001">principlesofnucl0000call</controlfield>


    and imports fine via import from clipboard.

    Importing metadata and importing identifiers are two completely different things, so the magic wand field is definitely not going to import metadata -- not sure why it should since you have the clipboard option available.
  • edited June 6, 2023
    So looking at the "actual XML" as page source (ctrl+u), then copying and inserting from clipboard, still gives an error "...please ensure that the file is valid..."

    (The metadata/identifiers explanation is clear.)

    (For reference: Zotero v6.0.26)


    <?xml version="1.0" encoding="UTF-8"?>
    <record xmlns="http://www.loc.gov/MARC21/slim">
    <leader>04091cam a2200961 a 4500</leader>
    <controlfield tag="001">principlesofnucl0000call</controlfield>
    <controlfield tag="003">CaSfIA</controlfield>
    <controlfield tag="005">20221214041246.0</controlfield>
    <controlfield tag="006">m o d</controlfield>
    <controlfield tag="007">cr||||||||||||</controlfield>
    <controlfield tag="008">910204s1991 enkaf ob 001 0 eng d</controlfield>
    <datafield tag="010" ind1=" " ind2=" ">
    <subfield code="z"> 91008439 </subfield>
    </datafield>
    <datafield tag="035" ind1=" " ind2=" ">
    <subfield code="a">(OCoLC)1194912241</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">NZ1</subfield>
    <subfield code="b">eng</subfield>
    <subfield code="c">NZ1</subfield>
    <subfield code="d">DLC</subfield>
    ...
  • Other import from clipboard (e.g. RIS, BibTeX) is working? How are you selecting? I'd recommend ctrl+a --> ctrl+c to make sure you're not missing anything -- XML breaks immediately if even one character at the beginning or end are left out. The error you're getting suggests this might be the issue.

    I tested this (in Zotero 7 but almost certainly shouldn't matter here) and it imports fine.

    If you really can't get this to work, could you please provide a Debug ID for the import attempt?
    https://www.zotero.org/support/debug_output
  • Right! Yes, I was using ctrl+a, however double checking this the page-source seems to add one blank-line at front and one at the end. Removing the two lines, the XML is imported correctly.
Sign In or Register to comment.