2nd page of PDFs for Metadata

bookhand · July 29, 2016

I was wondering if it might be possible to have (or even force) Zotero's native metadata generator check the second or third page of a PDF. Frequently, PDFs scanned by libraries for ILL requests will begin with an information sheet, then sometimes the title page of the collection of essays, and then the second (or even third) image will be the specific article. Zotero rather makes a hash of these at the moment - sometimes it identifies the PDF as the whole book, more often it just fails silently. Thanks.

adamsmith · July 30, 2016

that's not how retrieve from metadata works. It doesn't actually try to read the metadata from the PDF (that'd be a total mess). It searches for a DOI or ISBN on one of the first 3(?) pages, if that fails, it takes a sentence out of roughly the middle of the text and searches google scholar for it. Because of the nature of google scholar, this will by far work best for journal articles and you'll get pretty limited success rates for books and book chapters unless they contain said ISBN.

bookhand · August 1, 2016

Adam - Thanks for clarifying. Since the native metadata extraction is completely opaque, I was generalizing from the fact that PDFs without title pages from essay collections seemed to be more readily recognized than PDFs with those title pages.