Zotero 5 beta stuck when try to retrieve PDF metadata


Right click on a pdf file in my library -> retrieve PDF metadata. Habitually I was getting success or failure, but now the progress windows just stay with no progress.

Debug output ID: D1829679537


  • I noticed the same thing yesterday. Never had problems with this until yesterday.
  • This should be fixed in build 102, which will be available in a few minutes. Let me know if you continue to have trouble.
  • Ok, it's not stuck anymore thanks. But information found are not the right one (I tried with multiple PDFs).

    This debug output D933167878 is when I tried with one ISO standard.

    It seem even to have strange behavior sometimes, like creating multiple entries with some PDF.
  • An other output with an other PDF D1595870038
  • If they're on the web, could you link to the PDFs in question?
  • ISO spec are behind paywall... but their on genesis: http://libgen.io/standarts/index.php?s=ISO+14040

    BTW can wait for https://github.com/citation-style-language/zotero-bits/issues/52 to be fixed.

    The MIT lock guide paper (funny paper by the way) http://www.lysator.liu.se/mit-guide/MITLockGuide.pdf

    I will try with a more "conventional papers" and report if I have problem.
  • The lock guide paper imports incorrectly in 4.0.x, too. Zotero just picks an unfortunate phrase to query google scholar. That's a fairly rare error and it's unfortunate, but it's not a regression.

    The ISO papers from Genesis don't have OCRd text, so I can't test with them but remember how Zotero does retrieve metadata:
    It looks for a DOI on the first pages and if it finds one queries CrossRef
    It looks for an ISBN on the first pages and if it finds one queries WorldCate
    It picks a phrase from somewhere in the middle of the document, puts it in quotation marks and queries google scholar, then imports the first hit.
    So I wouldn't have high hopes for the ISO paper anyway, though I'm surprised that gets a false positive (maybe google scholar is getting less strict about quotes?).

    What does sound like a bug is Zotero importing multiple papers, though. That shouldn't be possible given the code. Are you sure that happened? Can you reproduce that?
  • edited December 17, 2016
    GS became less strict about phrases within quotes in early summer of this year. Word order no longer matters if there is not an _exact_ match. With SafetyLit, we had to scrap an automated search process and revert to hands-on.

    edit: Also, even within quotations, GS ignores stopwords. Even if there is an exact match GS seems to also provide "similar" items ordered not by closeness of match but by some other algorithm. The exact match is included but not necessarily in first position.
  • "What does sound like a bug is Zotero importing multiple papers, though" I will try to reproduce that. Was not able to for the moment, but I end up with around 5 entry and one with the PDF. I will open a new thread if I'm able to reproduce the bug.

    "GS became less strict about phrases within quotes in early summer of this year." Ok good to know! So it's not a regression. Perhaps also a new thread should be open for this point? Or directly a bug report on github...
  • The forum here is the right place to post this type of bug report. You can keep all of the discussion in this thread.
  • Latest version still unable to retrieve metadata from pdf on my installation. Just keeps trying forever witouth any noticeable progress.
  • @lbasti01: Can you restart Zotero and provide a Debug ID for one attempt that hangs?
  • @danstillman here is the debug ID: D1386532735
  • @lbasti01: Can you try resetting your translators from the Advanced → Files and Folders pane of the Zotero preferences?
  • @danstillman no improvement after reset
  • OK, could you provide another Debug ID for this, starting only after Zotero is done trying to sync? (In other words, this shouldn't use the "Enable after restart" option. I'll ask for that specifically when it's needed. The restarting I asked for is just to clear the error log included in the report.)
  • @danstillman the issue is: zotero never seems to stop trying to retrieve the metadata... goes on indefinitely
  • Right, but I'm just asking for a Debug ID that doesn't include Zotero startup and the initial auto-sync.
  • @danstillman here you go: 2009305558
  • That's a Report ID. I need a Debug ID, like you provided before, but without the "Enable after restart" option and waiting to start it until the initial auto-sync is done.
  • @danstillman It works now! I needed to restart the app after resetting translators, which I didn't do... Many thanks!
  • OK, great. Restarting shouldn't be necessary after resetting translators — and, for that matter, resetting translators shouldn't be necessary — so we'll look into that.
  • Zotero 5 was note able to find metadata from this PDF http://poq.oxfordjournals.org/content/69/5/778.full.pdf from http://poq.oxfordjournals.org/content/69/5/778.abstract . This pdf do have a DOI on the first page.
  • Oh this is because the PDF is locked. I unlocked the PDF using http://www.pdfunlock.com/ and now Zotero is able to find the metadata.

    Would be great if zotero unlock PDF automatically...
Sign In or Register to comment.