Retrieving metadata from archive.org
About half of the PDFs that I've imported into Zotero from Internet Archive (archive.org) are unable to retrieve their metadata using the Zotero "Retrieve Metadata for PDF" function. I am downloading the .pdf and .mrc files of documents from archive.org and would like the PDFs to link with the metadata that should be attached with them. Alternatively, is there a file type from archive.org that I can download that would make the metadata for the PDF automatically appear in Zotero? Many thanks in advance for any help!
I think we used to have a translator tailored for archives.org, but it seems that it's not working with the updated interface. We'll take a look at fixing that.
archives.org does offer metadata as MARCXML, which is a very rich metadata format and we should be able to import it, but I see that we currently do not. We'll look into fixing that as well.
We should have something working over the weekend and I'll report back here when something changes. Thanks for letting us know.
https://archive.org/details/agrariantenures00unkngoog
https://archive.org/details/ahandbooktoland01earlgoog
https://archive.org/details/bodykechapterinh00normuoft
https://archive.org/details/lettertoabsentee19wigg
If there were a translator tailored for archive.org, that would be excellent! I found one online but it didn't work, probably because it isn't working with the updated interface, as you said.
Because of the scale (~10 000 documents) of my project, I'm hoping to figure out a way to collate the metadata with PDFs without having to do a manual drag-and-drop. I started doing this, but also found that because the MARC files lose their unique identifier (e.g. bodykechapterinh00normuoft) instantly upon being dragged into Zotero while the PDFs lacking their metadata retain this unique identifier, matching MARC files with their PDFs involves opening up each PDF, scrolling down a few pages, reading the title, and then tracking down the MARC file to which it belongs. If there were a way to link the PDF with metadata automatically, I would be extremely grateful!
Thanks again for your help!
If you're interested and somewhat technically fearless, I could give you a couple of lines of code to add to a custom version of the translator, though, that would download PDFs automatically.
Unfortunately, the translator still doesn't seem to be working for me. When I upload PDFs from archive.org to Zotero, no metadata appears. I've tried all of the different kinds of file types that archive.org offers for its documents (.pdf, .xml, .txt, .djvu, .epub, .jp2) but none of these seem to include metadata either when I upload them to Zotero. Do you know what might be wrong?
If you were able to send me the code to add to a custom version of the translator to download PDFs, automatically, I would be extremely grateful!
https://archive.org/details/agrariantenures00unkngoog and clicking the "Save to Zotero" icon.
We could just add a hidden pref for adding PDFs per se, though, along the lines of supplements for other sites.
https://www.zotero.org/support/zotero_data
You'll need to do 5 things to the file (edit in any good text editor):
1. Right after
for (i in tags) {
newItem.tags.push(tags[i]);
}
insert this custom code:
if (itemType== "book"){
var pdfurl = apiurl.replace(/details(\/[^/]+)&output=json/, "download$1$1.pdf");
newItem.attachments.push({"url": pdfurl, "title": "Internet Archive Fulltext PDF", "mimeType": "application/pdf" })
}
2. Change the priority of the translator from 100 to 99 at the top
3. Rename the translator (Label) into something like Internet Archive (Custom)
4. Change any character in the translatorID
5. Save the customized translator under a different filename (e.g. using the same as the label) in the Zotero data directory.
Restart Firefox (or Zotero & your Browser) and you should have your custom translator with PDF download working.
edit: overlapped, but I'm still interested in the URL question.
https://archive.org/details/jstor-2212307
https://archive.org/details/tenureoflandinir00duff
https://archive.org/details/jstor-1814000
I followed your instructions for writing a custom translator and have saved it as a Rich Text Document in my Zotero translators directory. Do you have any ideas about why it might not be downloading PDFs?
I'd be in favor of a Zotero-wide pref for PDF limit size. This is hardly the only site that sometimes serves very large PDFs & it seems somewhat arbitrary to treat this particular site differently.
I saved the translator as a Javascript file (.js) but it still doesn't download PDFs into Zotero standalone.
First, make sure you're saving in plain text format (if need by by downloading from github via the raw link). Then restart all relevant software & try again.
I'm fine with 10 for the limit.