Retrieving metadata from PDFs is very slow

Running v5.0.88. I've noticed that when I import a PDF into my Zotero library, the metadata retrieval is much slower than it used to be. It eventually works, but it takes a lot longer than previously. Any idea why?
  • It depends on the upstream providers used for a specific PDF.

    If you provide a Debug ID for an attempt that's slow, we can have a look.
  • D1579249704

    The metadata retrieval took approximately 20 seconds.
  • (4)(+0000001): INSERT OR IGNORE INTO fulltextItemWords (wordID, itemID) SELECT wordID, ? FROM fulltextWords JOIN indexing.fulltextWords USING(word) [22734]

    (4)(+0001749): REPLACE INTO fulltextItems (itemID, version, synced, indexedPages, totalPages) VALUES (?, ?, ?, ?, ?) [22734, 0, 0, 11, 11]

    (4)(+0000002): DELETE FROM indexing.fulltextWords

    (3)(+0000002): Notifier.trigger('refresh', 'item', [22734]) queued

    (4)(+0013629): Committed DB transaction zRzWHNRM
    Unless that debug output wasn't representative, I don't think you're actually seeing a problem with metadata retrieval. The metadata retrieval appears to be extremely quick in that example. But there was a 14-second delay in indexing the PDF's full-text content, which is very slow.

    If you temporarily disable "Automatically retrieve metadata for PDFs" in the General pane of the Zotero preferences, you should be able to distinguish between the time it takes to add them PDF and have it show as indexed in the right-hand pane and the time it takes to retrieve metadata.

    For the full-text indexing issue, how big is zotero.sqlite in your Zotero data directory? How many items are in your database?
  • You're right, it's the indexing that's taking a long time, not the metadata retrieval. My zotero.sqlite is 70MB in size, and I have over 10,200 items in my database.

    The thing is, this long indexing delay happened rather suddenly. The delay didn't get longer gradually over time. Is there a fix for something like this?
  • I just decided to disable indexing since I don't really need that feature anyway. Metadata retrieval is fast again!
  • This certainly shouldn't happen in a 70 MB database. Is this by any chance on an old computer with a spinning disk (i.e., not an SSD)?
  • It does have a spinning disk but the PC is not that old. It's an i5-7500 CPU with 16GB RAM.

    To me, the puzzling thing is how the slowdown was rather sudden. It was working quickly before, then suddenly things slowed down.
  • The slow metadata retrieval problem has reappeared. I had swapped out the old hard drive with an SSD and things were working well but now it's slow again. I have disabled PDF indexing but metadata retrieval is still slow. Here's a debug ID of a recent attempt:
    D2112068734

    Any help would be appreciated!
  • (3)(+0000000): HTTP GET https://doi.org/10.[…]

    (3)(+0010599): Translate: Could not find a result using DOI Content Negotiation -- trying next translator

    (3)(+0000000): HTTP GET https://doi.org/10.[…] failed with status code 504
    @quagman: Sorry I missed this at the time, but this appears to just be a 10-second timeout from Crossref. Wouldn't be a regular thing — it was just instantaneous for me — and nothing we can do about it. If you're seeing it regularly, there could be a problem with your network, but this is almost certainly remote on their end.
  • (IIRC, that was the time CrossRef regularly struggled with API performance)
Sign In or Register to comment.