What am I doing wrong with metadata retrieval?
I used Zotero for a couple of years, then decided to try a different citation tool to see if I'd like it. I switched to ReadCube for about four months, and then decided to switch back to Zotero. I wanted very much to like ReadCube, but it was slow as molasses, its Word plug-in was poor at best, and it would crash on a regular basis.
However, one thing ReadCube did a better job of than Zotero was in retrieving metadata—so much better that it makes me think I'm doing something wrong in Zotero. Am I?
I download PDFs of journal articles from a variety of sources and drag them into my citation tool. With ReadCube, I'd say it would correctly find most of the metadata automatically for at least 50-60% of the articles I'd import. When it would fail, I would use ReadCube's feature that would allow me to highlight the title, authors' names, and/or other fields, and then tell it to search again, at which point it would get to, oh, maybe 95% or even better. It was rare that I had to manually enter the data for an article, and almost always they'd be non-scientific papers.
With Zotero, I use the Retrieve Metadata for PDF menu item, but I'd say it works in maybe 30-40% of the cases, and this is for scientific papers. I'm remembering now why I wanted to try something else, because it's so much work to enter all that information manually. And it's sitting there in the document, often on a cover page in a defined format!
So... am I doing something wrong? Is there a way to improve Zotero's metadata retrieval success rate?
Thanks!
However, one thing ReadCube did a better job of than Zotero was in retrieving metadata—so much better that it makes me think I'm doing something wrong in Zotero. Am I?
I download PDFs of journal articles from a variety of sources and drag them into my citation tool. With ReadCube, I'd say it would correctly find most of the metadata automatically for at least 50-60% of the articles I'd import. When it would fail, I would use ReadCube's feature that would allow me to highlight the title, authors' names, and/or other fields, and then tell it to search again, at which point it would get to, oh, maybe 95% or even better. It was rare that I had to manually enter the data for an article, and almost always they'd be non-scientific papers.
With Zotero, I use the Retrieve Metadata for PDF menu item, but I'd say it works in maybe 30-40% of the cases, and this is for scientific papers. I'm remembering now why I wanted to try something else, because it's so much work to enter all that information manually. And it's sitting there in the document, often on a cover page in a defined format!
So... am I doing something wrong? Is there a way to improve Zotero's metadata retrieval success rate?
Thanks!
Where you have access to the PDF, this will typically include the PDF together with the item.
But then also, retrieve metadata should have much higher rates that 35-40% for scientific papers, so I'm guessing you're using it so heavily that you're getting locked out by google scholar, on which Zotero currently relies for that functionality. You can test that by going to scholar.google.com and searching for something. Again, lock-out is not an issue if you're using the regular method for import.
If you're using Retrieve Metadata heavily, you're likely being blocked by Google Scholar (which is used behind the scenes for some files), which will result in many further lookups failing until they stop blocking you. That at least used to show a proper error message, but it's possible Google's current blocking technique (which changes frequently) doesn't trigger that error, so we'd have to see some example PDFs to confirm whether that's what's happening for you.
Absent throttling, for articles that appear in Google Scholar you should get much higher recognition rates than 30–40%. And recent papers should generally have a DOI on the first page, which doesn't use GS and should be close to a 100% recognition rate. Note that this isn't the primary workflow Zotero has been designed for. In Zotero, you would generally just click the Save to Zotero button in your browser while viewing the journal/database page, which would save both the metadata and (if you have access) the PDF. You would only use Retrieve Metadata for PDF in situations where you ended up with a PDF without ever being on the corresponding article page.
That said, we'll soon be rolling out an entirely revamped version of Retrieve Metadata for PDF that should avoid this sort of throttling and should better enable a PDF-first workflow.
I suspect I'm not being blocked by Google Scholar, but I don't know that. Certainly it seems to work just fine when I use it myself.
I'm happy to provide links to some PDFs that fail, but since Save to Zotero works, and Retrieve Metadata to PDF is being entirely revamped, perhaps that's not necessary?
Thanks again!