Zotero 5 beta stuck when try to retrieve PDF metadata

gagarine · December 16, 2016

Hello,

Right click on a pdf file in my library -> retrieve PDF metadata. Habitually I was getting success or failure, but now the progress windows just stay with no progress.

Debug output ID: D1829679537

Regards

Simon

ben_a · December 17, 2016

I noticed the same thing yesterday. Never had problems with this until yesterday.

dstillman · December 17, 2016

This should be fixed in build 102, which will be available in a few minutes. Let me know if you continue to have trouble.

gagarine · December 17, 2016

Ok, it's not stuck anymore thanks. But information found are not the right one (I tried with multiple PDFs).

This debug output D933167878 is when I tried with one ISO standard.

It seem even to have strange behavior sometimes, like creating multiple entries with some PDF.

gagarine · December 17, 2016

An other output with an other PDF D1595870038

adamsmith · December 17, 2016

If they're on the web, could you link to the PDFs in question?

gagarine · December 17, 2016

ISO spec are behind paywall... but their on genesis: http://libgen.io/standarts/index.php?s=ISO+14040

BTW can wait for https://github.com/citation-style-language/zotero-bits/issues/52 to be fixed.

The MIT lock guide paper (funny paper by the way) http://www.lysator.liu.se/mit-guide/MITLockGuide.pdf

I will try with a more "conventional papers" and report if I have problem.

adamsmith · December 17, 2016

The lock guide paper imports incorrectly in 4.0.x, too. Zotero just picks an unfortunate phrase to query google scholar. That's a fairly rare error and it's unfortunate, but it's not a regression.

The ISO papers from Genesis don't have OCRd text, so I can't test with them but remember how Zotero does retrieve metadata:
It looks for a DOI on the first pages and if it finds one queries CrossRef
It looks for an ISBN on the first pages and if it finds one queries WorldCate
It picks a phrase from somewhere in the middle of the document, puts it in quotation marks and queries google scholar, then imports the first hit.
So I wouldn't have high hopes for the ISO paper anyway, though I'm surprised that gets a false positive (maybe google scholar is getting less strict about quotes?).

What does sound like a bug is Zotero importing multiple papers, though. That shouldn't be possible given the code. Are you sure that happened? Can you reproduce that?

DWL-SDCA · December 17, 2016

GS became less strict about phrases within quotes in early summer of this year. Word order no longer matters if there is not an _exact_ match. With SafetyLit, we had to scrap an automated search process and revert to hands-on.

edit: Also, even within quotations, GS ignores stopwords. Even if there is an exact match GS seems to also provide "similar" items ordered not by closeness of match but by some other algorithm. The exact match is included but not necessarily in first position.

gagarine · December 18, 2016

"What does sound like a bug is Zotero importing multiple papers, though" I will try to reproduce that. Was not able to for the moment, but I end up with around 5 entry and one with the PDF. I will open a new thread if I'm able to reproduce the bug.

"GS became less strict about phrases within quotes in early summer of this year." Ok good to know! So it's not a regression. Perhaps also a new thread should be open for this point? Or directly a bug report on github...

bwiernik · December 18, 2016

The forum here is the right place to post this type of bug report. You can keep all of the discussion in this thread.

lbasti01 · January 12, 2017

Latest version still unable to retrieve metadata from pdf on my installation. Just keeps trying forever witouth any noticeable progress.

dstillman · January 12, 2017

@lbasti01: Can you restart Zotero and provide a Debug ID for one attempt that hangs?

lbasti01 · January 12, 2017

@danstillman here is the debug ID: D1386532735

dstillman · January 12, 2017

@lbasti01: Can you try resetting your translators from the Advanced → Files and Folders pane of the Zotero preferences?

lbasti01 · January 12, 2017

@danstillman no improvement after reset

dstillman · January 12, 2017

OK, could you provide another Debug ID for this, starting only after Zotero is done trying to sync? (In other words, this shouldn't use the "Enable after restart" option. I'll ask for that specifically when it's needed. The restarting I asked for is just to clear the error log included in the report.)

lbasti01 · January 12, 2017

@danstillman the issue is: zotero never seems to stop trying to retrieve the metadata... goes on indefinitely

dstillman · January 12, 2017

Right, but I'm just asking for a Debug ID that doesn't include Zotero startup and the initial auto-sync.

lbasti01 · January 12, 2017

@danstillman here you go: 2009305558

dstillman · January 12, 2017

That's a Report ID. I need a Debug ID, like you provided before, but without the "Enable after restart" option and waiting to start it until the initial auto-sync is done.

lbasti01 · January 12, 2017

@danstillman It works now! I needed to restart the app after resetting translators, which I didn't do... Many thanks!

dstillman · January 12, 2017

OK, great. Restarting shouldn't be necessary after resetting translators — and, for that matter, resetting translators shouldn't be necessary — so we'll look into that.

gagarine · January 14, 2017

Zotero 5 was note able to find metadata from this PDF http://poq.oxfordjournals.org/content/69/5/778.full.pdf from http://poq.oxfordjournals.org/content/69/5/778.abstract . This pdf do have a DOI on the first page.

gagarine · January 14, 2017

Oh this is because the PDF is locked. I unlocked the PDF using http://www.pdfunlock.com/ and now Zotero is able to find the metadata.

Would be great if zotero unlock PDF automatically...