Retrieve metadata for pdf doesn't seem to work correctly

getvenkat · November 29, 2012

First, a big thanks for the great program. I started using zotero a few days back and it is really nice.

I have a bunch of PDFs without any reference manager associated with them. I want to import into zotero. I couldn't figure out how to do mass import of PDFs, even after reading a few forum posts. So, I started importing them one-by-one and using the excellent "retrieve metadata for PDF" option.

It works well when it gets citations from google scholar. Sometimes it does not. Also, I think the DOI retrieval from pdf seems to be broken.

I traced it down to wrong query to crossref. I may be wrong though.

My setup: Windows-7, firefox-17, zotero 3.0.11
Debug ID: D528277138

pdftotext seems to output text from pdf which contains the doi. However the query to crossref doesn't seem to contain any valid doi.

I'd appreciate any workaround to it. Also, if there is a way to do mass imports it'd be very useful to me.

Relevant lines from debug output:
--------------------------------
(3)(+0002396): Running pdftotext -enc UTF-8 -nopgbrk -layout -l 3 I:\venkat\zotero\storage\UA2T84D2\2000Kerwin_tag.pdf I:\venkat\zotero\recognizePDFcache.txt

(3)(+0000051): Journal of Magnetic Resonance 142, 313–322 (2000)
doi:10.1006/jmre.1999.1946, available online at http://www.idealibrary.com on
A k-Space Analysis of MR Tagging

...<snip>

(4)(+0000001): Translate: Binding sandbox to http://www.example.com/

(4)(+0000001): Translate: Parsing code for CrossRef

(3)(+0000001): Translate: Beginning translation with CrossRef

(3)(+0000001): HTTP GET http://www.crossref.org/openurl/?pid=zter:zter321&url_ver=Z39.88-2004&&rft_id=info:doi/null&noredirect=true&format=unixref

(3)(+0000372): Translate: Could not find a result using CrossRef:
fileName => I:\venkat\zotero\translators\CrossRef.js

aurimas · December 1, 2012

Thanks for reporting and taking the time to debug this. It seems that this bug has been around for some time now and we've just been using Google Scholar. It will be fixed in the next Zotero release.

getvenkat · December 4, 2012

Thanks for looking in to this. If you've already fixed it, can you please point me to the fix?

I'm in the middle of importing my paper collection and this would improve the speed of import significantly.

For my PDFs atleast, Google Scholar has several errors (years are wrong for example), and the DOI import from the publisher (mostly Wiley, Elsevier or IEEE) should import without these errors.

adamsmith · December 4, 2012

the easiest is probably to install the branch xpi:
http://www.zotero.org/support/dev_builds
assuming that has been updated with the fix already (it usually gets updated once every day or two)
You should revert back to the regular version of Zotero once the next version comes out.
If you want to patch this yourself, the commit is here:
https://github.com/zotero/zotero/commit/435b1d7bd80a88befdc0810b753eaa0211b77f49

(and yes, absolutely - results using DOI are much better than google scholar, pretty much across the board)

getvenkat · December 4, 2012

I tried the dev xpi. It works now. Gets the data from crossref correctly.

Thanks for the fix and the very quick response.

In my limited experience, in terms of getting accurate citations Zotero is the best I've found (better than Jabref or Mendeley).

Do you have any plans of developing a client for iOS? I have an ipad and it'd be awesome to link zotero bibliography with PDFs on it.

adamsmith · December 4, 2012

there is zotpad:
http://www.zotpad.com/
very actively developed by a third party developer
as well as a mobile version of the library display at zotero.org

There is also a bookmarklet which works to import while browsing sources in iOS:
http://www.zotero.org/downloadbookmarklet
which complements that.

getvenkat · December 6, 2012

I feel a bit stupid for not noticing an easy way to do mass imports.

Just select multiple PDFs in the explorer and drag them in to Zotero.
Then select those PDFs in zotero and do retrieve metadata.

This combined with the DOI fix above works significantly faster.

adamsmith · December 6, 2012

(note that google scholar will lock you out if it detects "too many" whatever that is - automated requests. As long as you're mainly getting the results from CrossRef you're good, though.)

getvenkat · December 6, 2012

Thanks. Good to know.