What am I doing wrong with metadata retrieval?

I used Zotero for a couple of years, then decided to try a different citation tool to see if I'd like it. I switched to ReadCube for about four months, and then decided to switch back to Zotero. I wanted very much to like ReadCube, but it was slow as molasses, its Word plug-in was poor at best, and it would crash on a regular basis.

However, one thing ReadCube did a better job of than Zotero was in retrieving metadata—so much better that it makes me think I'm doing something wrong in Zotero. Am I?

I download PDFs of journal articles from a variety of sources and drag them into my citation tool. With ReadCube, I'd say it would correctly find most of the metadata automatically for at least 50-60% of the articles I'd import. When it would fail, I would use ReadCube's feature that would allow me to highlight the title, authors' names, and/or other fields, and then tell it to search again, at which point it would get to, oh, maybe 95% or even better. It was rare that I had to manually enter the data for an article, and almost always they'd be non-scientific papers.

With Zotero, I use the Retrieve Metadata for PDF menu item, but I'd say it works in maybe 30-40% of the cases, and this is for scientific papers. I'm remembering now why I wanted to try something else, because it's so much work to enter all that information manually. And it's sitting there in the document, often on a cover page in a defined format!

So... am I doing something wrong? Is there a way to improve Zotero's metadata retrieval success rate?

Thanks!
  • The biggest issue is that Zotero is not optimized for the way you're using it. Yes, retrieve metadata exists&works, but it's a stop gap solution. I use Zotero constantly and maybe retrieve metadata once a month. The main way to get items into Zotero, and the way we spend by far the most time optimizing, is via the "Save to Zotero" icon in your browser: https://www.zotero.org/support/getting_stuff_into_your_library#web_translators
    Where you have access to the PDF, this will typically include the PDF together with the item.

    But then also, retrieve metadata should have much higher rates that 35-40% for scientific papers, so I'm guessing you're using it so heavily that you're getting locked out by google scholar, on which Zotero currently relies for that functionality. You can test that by going to scholar.google.com and searching for something. Again, lock-out is not an issue if you're using the regular method for import.
  • Can you link to a few example PDFs that are failing?

    If you're using Retrieve Metadata heavily, you're likely being blocked by Google Scholar (which is used behind the scenes for some files), which will result in many further lookups failing until they stop blocking you. That at least used to show a proper error message, but it's possible Google's current blocking technique (which changes frequently) doesn't trigger that error, so we'd have to see some example PDFs to confirm whether that's what's happening for you.

    Absent throttling, for articles that appear in Google Scholar you should get much higher recognition rates than 30–40%. And recent papers should generally have a DOI on the first page, which doesn't use GS and should be close to a 100% recognition rate.
    I download PDFs of journal articles from a variety of sources and drag them into my citation tool.
    Note that this isn't the primary workflow Zotero has been designed for. In Zotero, you would generally just click the Save to Zotero button in your browser while viewing the journal/database page, which would save both the metadata and (if you have access) the PDF. You would only use Retrieve Metadata for PDF in situations where you ended up with a PDF without ever being on the corresponding article page.

    That said, we'll soon be rolling out an entirely revamped version of Retrieve Metadata for PDF that should avoid this sort of throttling and should better enable a PDF-first workflow.
  • (@adamsmith: This is getting a little creepy.)
  • Thanks! Using the Save to Zotero button definitely does the trick.

    I suspect I'm not being blocked by Google Scholar, but I don't know that. Certainly it seems to work just fine when I use it myself.

    I'm happy to provide links to some PDFs that fail, but since Save to Zotero works, and Retrieve Metadata to PDF is being entirely revamped, perhaps that's not necessary?

    Thanks again!
  • I suspect I'm not being blocked by Google Scholar, but I don't know that. Certainly it seems to work just fine when I use it myself.
    Viewing GS in the browser actually isn't really representative anymore (as it was with Zotero for Firefox) because the Zotero application has its own cookie store. And since Google never sees those cookies elsewhere around the web, as it would with your normal browser, it's also now much quicker to block Zotero. So you're quite likely being blocked, but as you say, no real need to debug this with the new version coming.
  • I don't know if that helps further, but after a recent zotero update, the metadata retrieval from within zotero standalone does not work anymore (could of course also be because of another non-zotero update). Anyways, I have tried different papers and none of them could retrieve metadate, whereas before it worked pretty well.
  • @kuefer: Metadata retrieval was broken in the beta, which I'm guessing you were running. We've fixed that in the latest build. If you didn't mean to be running the beta, you should switch back to the release version from the download page. If you did, be sure to always mention that you're running the beta and provide a Report ID when reporting problems here.
Sign In or Register to comment.