one-click update metadata for a zotero object

I use zotero to manage a library of journal articles. These articles are often first published electronically, and don't get page numbers, etc., until they are published in print. I often find myself making one zotero item when the article is published online, then when the print info is available, making another zotero item, attaching a link to the pdf, putting in new html tags to italicize things correctly, adding a journal abbreviation if it is not already there, mergeing duplicates, etc. I don't think that there is a way to have zotero update the metadata for an item automatically. It seems like it would be pretty easy to implement a one-click way that you could ask zotero to retrieve any changes to the metadata associated with a certain article, similar to the functionality that lets you create a new object by doi or pmid. It would be great if it could do this in such a way that it would not replace all fields, just empty ones, so things like html tags for italics in titles and user-added info like journal abbreviations wouldn't need to be entered manually again.
«1
  • This is planned in general, but no one is currently working on it. I agree that the procedure would be fairly straight-forward - someone just needs to do it.

    Regarding journal abbreviations, this is already somewhat automated (using a list of journal abbreviations from MEDLINE), so you do not need to worry about entering it as much.
  • +1 on this! Would also be very interested in this feature!
  • +1 from me as well! It would be great to have that feature! Please consider using PMID for lookup. Pubmed metadata is by far superior to Google Scholar!
  • we'll probably try to avoid google scholar as much as possible for item complete.
    GS without alternative for retrieving metadata from PDFs because it's the only comprehensive full-text archive, but once we have a DOI, PMID, or even a title, we can work with those.
  • Sounds perfect! Thx for the fast reply, I appreciate it!
  • Could we maybe get a quick statement from the devs regarding the proposed "metadata update' feature? I could not find a ticket in github regarding this matter so I am not sure if this is actually planned. Please have a look into this when there is time. Currently, this is the only thing that keeps me from switching.

    THX a lot.
  • Not going to happen very soon unless someone provides a patch. Definitely still planned in general & listed here: http://www.zotero.org/support/requested_features Several major features don't have an issue - the issue tracker is for dev use, so most tickets are reminders for smaller things.
  • It is important to mention that the digital versions of some papers are no exatly the same that the printed ones. We can find on the internet, pre-prints, post-prints, first versions, second versions... (http://en.wikipedia.org/wiki/Preprint)... and they are all diferent from the printed one.

    So we have to be carefull not to mix all those up.
  • I feel as though this feature essentially already exists, but can't be used for this purpose just because nobody's made it so.

    The "retrieve PDF metadata" command would be enough to get this done, since it searches for the DOI anyway. The DOI never changes, even when an article goes from an ASAP to a published paper; thus if the same process were used for a parent item it should work.
  • The problem has never been fetching the data -- as you said, the code for that has existed for a long time. The problem is implementing this smoothly so that it works in an intuitive way for users on a GUI level and does good job merging the fetched data with existing data.
    Ideally, it'd also have a mechanism to work where a DOI is absent, though i'd say that's a lesser consideration.

    I'm pretty sure zotero's lead dev would accept patches if someone wants to have a go, though I'd recommend discussing the general approach on zotero-dev before diving into it.
  • I suppose what I find confusing is that while there's a right-click context menu that performs this exact function for a PDF, the same function does not exist for a parent item.

    The GUI is there, the code is there, the feature is already known by its exact name.

    I know pretty much nothing about programming or coding but I would definitely be willing to learn in this case. It just screams to be done!
  • it's the merging of old and new item that's the new code. That would likely look something like the duplicate merge dialog, so it's not like we have nothing similar in place, but still -- the details in the end always take up a lot of time.
  • Yes,Re-fetch metadata for an item would be very helpful!
  • +1 It's been years I've been hoping this functionality would be implemented...
  • +1 waiting for years
  • Mendeley has an option to mark an item as "Needs Review" and then you can click on Search in Details to re-scan the metadata from the web. Most of the time it fills empty fields correctly. In fact, that was the reason I chose to use Mendeley over Zotero long time ago.
  • waiting for years
  • +1 this feature would save us a lot of time !
  • +1 Waiting for this features. Some times the retrived metadata are wrong and don't extract some metadata.
  • Hellooooo- echoooooo----
  • edited January 22, 2019
    My understanding is it’s currently being worked on.
  • edited January 24, 2019
    I'm feeling compelled to comment about this with my own experience. This isn't a simple problem.

    I have 2 such update systems for my web-based (non-Zotero) bibliographic database. One system uses the pmid to requery the PubMed database 12 months after the online publication and (if the metadata hasn't been updated) every 2 months until the record is complete. My system inserts "ePub" for missing volume, issue, and pagination when these metadata items are missing (but I store the information for each field origionally provided by PubMed -- more on this later).

    For the many journals that aren't included in PubMed we poll CrossRef using the DOI with the same 1 year delay and 2 month repeat.

    Some publishers send updates of their metadata but the update contains what I think is usless information. For example, Taylor and Francis Group journals' initial metadata (depending on the datasource) might contain empty fields or "ePub" but a "0" for the pagination field. Sometimes TFG will update the "0" pagination metadata and provide the number of pages that the article will consume but still not provide any new volume, issue or page-range metadata. I don't want to store that number-of-pages value because it could be later confused with an article item number for electronic publications.

    Some publishers of online only journals will provide metadata that includes an article number before the article has been assigned to a volume/issue. Sometimes that article number will be unchanged when the article is assigned to a volume/issue. In those cases the article number is often the right-most characters of the DOI. Other times the article number _will_ change when the article has been assigned to a volume/issue.

    The question here is do we want temporary/transitory information in metadata fields or should ePub metada not be updated until final and complete information is available? I don't want a zero in the pagination field or the number of pages in that field. I believe that kind of metadata is unwanted by journal editors or professors. I have chosen to not provide temporary metadata but to wait until it is complete. I use the publishers' original metadata and knowledge of each individual publisher's metadata-release patterns in an algorithm to determine whether the updated metadata is yet useful of not.

    edit
    Another problem is how to time the requests for updates so that the system that holds the metadata will not be adversely affected by my requests.
  • edited January 26, 2019
    As an example of one of the problems I mentioned above, see: DOI 10.1123/japa.2018-0010

    Human Kinetics (late January 2019) for this journal article provides the pagination as 1-30 for this ahead-of-print publication. When assigned to an issue the actual page numbers will likely change but the publisher's pagination metadata isn't always updated. This is especially true when the prepublication article is assigned to an online only journal -- the number of pages masquerading as a page range often isn't updated to reflect the article number.

    By the way, have a look at this publisher's page source html for this article.

    Summary: Automatically updating article metadata will have limits to full accuracy and some amount of hand-editing will likely be needed.
  • Is this still being worked on? I have lots of items imported from a bibtex / Mendeley library that need updates / fixes. In the absence also of a find/replace mechanism, I was expecting to be able to mark/select entries and have Zotero re-fetch the journal, title, volume, index, doi, pages, etc metadata.

    Is there a timeframe for having this feature implemented/released?
  • Yes, it's still being worked on, but in the meantime you can just paste the DOI, PMID, ISBN, or arXiv ID into Add Item by Identifier to create a new item, select both items, and merge them. If no identifier, do the same via Save to Zotero from your browser. We know that takes a little more work, but it should still be quite fast, which is why this hasn't been a top priority.
  • Is there any progress for this issue?
  • +1 waiting for this lovely add on
  • this function is highly wanted!
  • I have 600+ items imported from Endnote with page information messed up.
    If updating a whole item is too complicated. It will be nice to have a addon to just update one field.
Sign In or Register to comment.