Retrieving metadata from PDFs is very slow

Running v5.0.88. I've noticed that when I import a PDF into my Zotero library, the metadata retrieval is much slower than it used to be. It eventually works, but it takes a lot longer than previously. Any idea why?
  • It depends on the upstream providers used for a specific PDF.

    If you provide a Debug ID for an attempt that's slow, we can have a look.
  • D1579249704

    The metadata retrieval took approximately 20 seconds.
  • (4)(+0000001): INSERT OR IGNORE INTO fulltextItemWords (wordID, itemID) SELECT wordID, ? FROM fulltextWords JOIN indexing.fulltextWords USING(word) [22734]

    (4)(+0001749): REPLACE INTO fulltextItems (itemID, version, synced, indexedPages, totalPages) VALUES (?, ?, ?, ?, ?) [22734, 0, 0, 11, 11]

    (4)(+0000002): DELETE FROM indexing.fulltextWords

    (3)(+0000002): Notifier.trigger('refresh', 'item', [22734]) queued

    (4)(+0013629): Committed DB transaction zRzWHNRM
    Unless that debug output wasn't representative, I don't think you're actually seeing a problem with metadata retrieval. The metadata retrieval appears to be extremely quick in that example. But there was a 14-second delay in indexing the PDF's full-text content, which is very slow.

    If you temporarily disable "Automatically retrieve metadata for PDFs" in the General pane of the Zotero preferences, you should be able to distinguish between the time it takes to add them PDF and have it show as indexed in the right-hand pane and the time it takes to retrieve metadata.

    For the full-text indexing issue, how big is zotero.sqlite in your Zotero data directory? How many items are in your database?
  • You're right, it's the indexing that's taking a long time, not the metadata retrieval. My zotero.sqlite is 70MB in size, and I have over 10,200 items in my database.

    The thing is, this long indexing delay happened rather suddenly. The delay didn't get longer gradually over time. Is there a fix for something like this?
  • I just decided to disable indexing since I don't really need that feature anyway. Metadata retrieval is fast again!
  • This certainly shouldn't happen in a 70 MB database. Is this by any chance on an old computer with a spinning disk (i.e., not an SSD)?
  • It does have a spinning disk but the PC is not that old. It's an i5-7500 CPU with 16GB RAM.

    To me, the puzzling thing is how the slowdown was rather sudden. It was working quickly before, then suddenly things slowed down.
  • The slow metadata retrieval problem has reappeared. I had swapped out the old hard drive with an SSD and things were working well but now it's slow again. I have disabled PDF indexing but metadata retrieval is still slow. Here's a debug ID of a recent attempt:

    Any help would be appreciated!
Sign In or Register to comment.