PDF retrieve metadata: five simple suggestions for improvement

1. Grey (disable) the option in the context menu for PDFs that do not contain OCRed text. Saves two clicks.

2. UI: put focus on 'Close' button after completion, and provide checkbox 'automatically close after completion'

3. Give a timeout and skip option (e.g. timeout and skip to next after 3 seconds). Right now Zotero can be stuck on a single item for an indefinite amount of time, without giving feedback on what's happening. It feels much better if the easiest items are done first. I find myself doing PDFs one by one because if I feed it a whole list it's certainly going to be stuck on the third or fifth item forever.

4. Preview metadata and allow approval/disapproval by user. As suggested here.

5. Perhaps place the button 'Retrieve metadata' in the right pane, next to the 'Indexed' field? I find myself looking for it there, probably because that place is contextually much more relevant. Also, it saves one click.

(Edit: added #5.)
  • These are excellent suggestions. I think exactly what the metadata search should look like.
    3 and 4 are obviously especially important.
  • ++1 on this, especially numbers 3 and 4. The metadata retrieval's tendency to hang indefinitely on a single doc completely kills any batch functionality, which is a major impediment to new people trying to import their stacks of PDFs into Zotero.

    I'd add a friendly amendment:

    6. Add an option to hook the PDF metadata retrieval into the magic wand menu, ie, right click on PDF for a "create parent item from database identifier" which would allow insertion of PMID, DOI, etc. This isn't a huge deal, obviously, since one could easily create the parent item from identifier, attach the PDF, and rename, but it does cut a few steps.
  • This all sounds good. There are numerous threads that relate to this issue, for example here, and here, in addition to the link Mark has included. Perhaps we could tie them all together in one place?

    I'd suggest a slight revision of Mark's suggestion #1, in connection with amacom73's ammended #6, that of greying out PDFs having no readable text. As per the discussions in the links, the 'retrieve metadata for PDF' is currently also being used to 'create parent item' from a standalone note or pdf or document. I'd like to put in a request that it also allow the creation of a parent for standalone links. So, instead of graying-out pdfs w/out readable text, perhaps they would not be elligible for a metadata search--instead you receive notice that the PDF does not contain readable text, and you are prompted to 'create blank parent item' (or there would be no prompt, and Zotero would automatically set up a blank parent item if you've selected the appropriate setting in advance).
  • I know that here and elsewhere, the idea of creating a blank parent item for PDFs when metadata lookup fails was presented. Now (2.0 final) we can create a blank parent for all attachment files except PDF files. Is there a reason that this can't happen for PDFs?
  • Is there a reason that this can't happen for PDFs?
    Because it hasn't been implemented yet.
  • I am new to this so I may be missing something. When I have been indexing pdf's some of them have metadata in the file but it is not found because it is not in Google Scholar. Can we have an option to import this?

    Also, I support the idea of a preview of the metadata retrieved, as sometimes it relates to a different file altogether.
  • Can we have an option to import this?
    This has been discussed, but it hasn't been coded yet. I imagine that this and a few other metadata proposals (EXIF for images, ID3 for MP3s) are on hold in part because Zotero needs a good, cross-platform way to access it.
  • Wondering if there have been any improvements/work arounds re. incorrect metadata being imported with pdf look up function.

    Any further suggestions?
  • this hasn't been worked on for a while - devs have been busy with lots of other stuff - I'm sure they'd be happy if someone wants to work on this - I believe it's a separate .js file so that would be pretty straightforward.
  • Has this seen any work since last year? At the very least, creating a new item for a PDf for which metadata wasn't retrieved would make many people happy. And since the code is already there (for non-PDFs) it shouldn't be difficult.
  • Any news? Just hoping someone code-savy will see this and pick up these suggestions, making everyone happy. As adamsmith notes above, it wouldn't be too difficult if you know JS.
  • I even made a patch to create an item when lookup fails, but the workflow wasn't very smooth. I may yet get back to that, but I've already promised a few other features that may take priority.
  • Thank you! As a shortcut you can at least try to convince the powers that be that the option to Create parent item for selected item should not have been disabled for PDF attachments.
  • Thank you mark for your thoughtful suggestion. I love all of them. Additionally, I would suggest to make the window close on pressing the ESC button. It would close on ESC only when finished. Or it would close also during scanning so, it would act as a Cancel button. Thank you for listening!
Sign In or Register to comment.