PDF retrieve metadata: five simple suggestions for improvement

mark · July 27, 2009

1. Grey (disable) the option in the context menu for PDFs that do not contain OCRed text. Saves two clicks.

2. UI: put focus on 'Close' button after completion, and provide checkbox 'automatically close after completion'

3. Give a timeout and skip option (e.g. timeout and skip to next after 3 seconds). Right now Zotero can be stuck on a single item for an indefinite amount of time, without giving feedback on what's happening. It feels much better if the easiest items are done first. I find myself doing PDFs one by one because if I feed it a whole list it's certainly going to be stuck on the third or fifth item forever.

4. Preview metadata and allow approval/disapproval by user. As suggested here.

5. Perhaps place the button 'Retrieve metadata' in the right pane, next to the 'Indexed' field? I find myself looking for it there, probably because that place is contextually much more relevant. Also, it saves one click.

(Edit: added #5.)

adamsmith · July 27, 2009

These are excellent suggestions. I think exactly what the metadata search should look like.
3 and 4 are obviously especially important.

amacom73 · July 28, 2009

++1 on this, especially numbers 3 and 4. The metadata retrieval's tendency to hang indefinitely on a single doc completely kills any batch functionality, which is a major impediment to new people trying to import their stacks of PDFs into Zotero.

I'd add a friendly amendment:

6. Add an option to hook the PDF metadata retrieval into the magic wand menu, ie, right click on PDF for a "create parent item from database identifier" which would allow insertion of PMID, DOI, etc. This isn't a huge deal, obviously, since one could easily create the parent item from identifier, attach the PDF, and rename, but it does cut a few steps.

JonEP · July 28, 2009

This all sounds good. There are numerous threads that relate to this issue, for example here, and here, in addition to the link Mark has included. Perhaps we could tie them all together in one place?

I'd suggest a slight revision of Mark's suggestion #1, in connection with amacom73's ammended #6, that of greying out PDFs having no readable text. As per the discussions in the links, the 'retrieve metadata for PDF' is currently also being used to 'create parent item' from a standalone note or pdf or document. I'd like to put in a request that it also allow the creation of a parent for standalone links. So, instead of graying-out pdfs w/out readable text, perhaps they would not be elligible for a metadata search--instead you receive notice that the PDF does not contain readable text, and you are prompted to 'create blank parent item' (or there would be no prompt, and Zotero would automatically set up a blank parent item if you've selected the appropriate setting in advance).

ajlyon · February 19, 2010

I know that here and elsewhere, the idea of creating a blank parent item for PDFs when metadata lookup fails was presented. Now (2.0 final) we can create a blank parent for all attachment files except PDF files. Is there a reason that this can't happen for PDFs?

dstillman · February 19, 2010

Is there a reason that this can't happen for PDFs?

Because it hasn't been implemented yet.

John Fletcher · February 28, 2010

I am new to this so I may be missing something. When I have been indexing pdf's some of them have metadata in the file but it is not found because it is not in Google Scholar. Can we have an option to import this?

Also, I support the idea of a preview of the metadata retrieved, as sometimes it relates to a different file altogether.

ajlyon · February 28, 2010

Can we have an option to import this?

This has been discussed, but it hasn't been coded yet. I imagine that this and a few other metadata proposals (EXIF for images, ID3 for MP3s) are on hold in part because Zotero needs a good, cross-platform way to access it.

jhandwerg · July 1, 2010

Wondering if there have been any improvements/work arounds re. incorrect metadata being imported with pdf look up function.

Any further suggestions?

adamsmith · July 1, 2010

this hasn't been worked on for a while - devs have been busy with lots of other stuff - I'm sure they'd be happy if someone wants to work on this - I believe it's a separate .js file so that would be pretty straightforward.

mark · April 29, 2011

Has this seen any work since last year? At the very least, creating a new item for a PDf for which metadata wasn't retrieved would make many people happy. And since the code is already there (for non-PDFs) it shouldn't be difficult.

mark · July 7, 2011

Any news? Just hoping someone code-savy will see this and pick up these suggestions, making everyone happy. As adamsmith notes above, it wouldn't be too difficult if you know JS.

ajlyon · July 8, 2011

I even made a patch to create an item when lookup fails, but the workflow wasn't very smooth. I may yet get back to that, but I've already promised a few other features that may take priority.

mark · July 8, 2011

Thank you! As a shortcut you can at least try to convince the powers that be that the option to Create parent item for selected item should not have been disabled for PDF attachments.

chritogjon · December 9, 2013

Thank you mark for your thoughtful suggestion. I love all of them. Additionally, I would suggest to make the window close on pressing the ESC button. It would close on ESC only when finished. Or it would close also during scanning so, it would act as a Cancel button. Thank you for listening!