PDF Metadata Import hangs

I have a recurrent problem where zotero appears to hang on certain PDFs while attempting to retrieve their metadata. There is nothing obviously wrong with these PDFs as I am able to open them in PDF viewers, and are very similar to other PDF files I have.

Needless to say this makes it difficult to efficiently import metadata, because if you highlight a long list of PDFs to find metadata on, and it hangs on 1, the task never gets finished. Furthermore, if you push Cancel, the process is terminated and the PDFs for which metadata was found are not updated either.

Additionally, as noted on other discussions the metadata retrieved is wrong in some cases.
  • May be try to retrieve metadata for three or four of them at a time to isolate which PDF is the problem.
  • If you can give us a download link (or even just the article title/author) for one of the PDFs, we can try to track down the source of the problem.
  • I have this same problem. If I select several PDF's and ask Zotero to "retrieve metadata", it will get stuck on PDF's that don't have OCR text. It will retrieve the metadata for each PDF preceding that one in the list but will get stuck at the non-OCR PDF, show a red X next to the file name is the Progress window and say "PDF does not contain OCRed text". Meanwhile the little animated gear icon will continue to appear as though it is working on the next item. But nothing ever happens. I just have to cancel, skip that one PDF, select another group of PDF's, and try again. Pretty annoying consider I can't tell which ones are OCRed quickly so as to either skip them in this metadata retrieval process or to manually OCR them all in Acrobat.
  • I should also add that if I leave my computer while Zotero is retrieving metadata and it hangs at a non-OCRed PDF file, it seems to be continually querying the Google Scholar database looking for metadata for a PDF it can't read. Inevitably, when I return to my computer, I've "reached my query limit" on any subsequent metadata retrieval for a few hours or so. FYI, I guess.
  • Some similar problems with attached files or linked files. While initiating the scan-process (indexing content of ocr pdf-files and metadata) Zotero crashes. I found out that the problem were the special characters in the file name. I had some "í" and "é" in the filename. Changing to "i" and "e" the scan works perfectly without crash.

    There seems to be a problem with special characters in the file name - in my case.

    If it's the case it should be resolved for the next version. Especially for non-english users like me.
  • I should add: Plattform: Linux (OpenSUSE 11.1), Firefox 3.0.8, Zotero 1.5b 2, Sun Java Version 1.6.0u13
  • I seem to be having at least a somewhat similar problem; Zotero is hanging on retrieving the metadata, and it appears to be because of the existence of special characters. Most of these papers are about \beta-peptides -- other papers get loaded fine. Seems like it should fail gracefully, with an error explaining why it failed.

    I'm using Firefox 3.0.12, Zotero 2.0b6.2
  • Also having the same problem - Zotero gets hung up on retrieving the metadata for a lot of my PDF's, almost a third. I've made sure all my PDFs are OCR'd and most of the ones that have this problem don't have any unusual characters in the file name or in the title of the article. I'm on OSX.

    It's pretty irritating since I'm trying to import all my PDFs, and if I have to manually do a third of them that's a lot of work.
  • Any responses? I'd love to provide error reports, but it doesn't seem to pop any up, so I have no idea why it's hanging. If there is some configuration that can be set, I'll be happy to provide more information.
  • I am surprised that the issue of metadata retrieval is know since 2009 and hasn't been fixed. To be specific, if I select all papers in my folder and click "Retrieve metadata for PDF", often it just hangs for hours, i.e., forever. Would it be possible to implement some internal test of success and skip those papers where the retrieval fails?
  • edited August 1, 2017
    Retrieve Metadata does generally work very reliably and consistently for users. It's not particularly helpful to dig up error reports from 8 years ago and assume that the issue is the same.

    What is probably happening is that Google Scholar is locking you out for sending a large number of queries to its database (Zotero relies in part on GS for Retrieve Metadata). When you are working with a very large number of items for any function (Retrieve Metadata, Export, etc.), it is best to do it in smaller batches (this is generally the case with all manner of database programs, such as Zotero, Mendeley, and other—operations on a large number of items take more time than operations on fewer items).
Sign In or Register to comment.