Updating publication information for arxiv pre-prints


In my field, most papers are shared as pre-prints on arxiv first, and then published officially in a journal or conference.
I use arxiv to find new papers and save them to zotero.
However, when it comes to writing papers, I would like to cite them with their 'official publication' information.
Is there a way to automatically update the zotero arxiv pre-prints items with their official bib information?

  • If the item already has a DOI on ArXiV when you import, Zotero will save the final published information.

    If not, there isn’t an automated way currently to update metadata. Your fastest workflow would probably be to follow the URL for the item to ArXiV, see if it has a published DOI. If it does, save the time again and then merge the duplicates.
  • Is it still the case that there is no automated way to update preprints (like from the arXiv) that have been published and acquired a DOI after being added to Zotero? It's too bad, because most preprint servers have good machine-readable metadata for this.
  • No updating of items of any kind has been implemented, correct. Upstream metadata is indeed not a significant blocker.
  • I'm curious what the blocker is? Just additional complexity?
  • I think it's a mix of a) lots of other stuff going on, b) getting the UI/UX for something that actually does a majority of the updating that people want isn't easy and related to that c) deciding on when which databases get queried is not at all trivial.
  • Got it, thanks very much.
  • I also work with a lot of arXiv preprints and from time to time (mostly when I actually get to read the paper) I want to check if the paper has been published or updated.
    Back when I was using Mendeley my workflow was to just update the metadata based on the arXiv identifier which would then fetch the DOI (assuming the authors updated their arXiv entry of course).
    The workflow explained by bwiernik is not too bad, but I kind of miss my old one, where I never had to leave the reference manager.
    Is there any chance that this feature will ever be implemented?
  • Yes, it’s being actively worked on.

    You could try to see if the DOI Manager plugin will retrieve the DOI for you. Then, you can copy the DOI, use the magic wand tool to make a new copy from the DOI, and merge the two items.
  • A couple questions for the people who want this:

    1) If you save a preprint from arXiv and later update it to add publication info, do you care that the saved PDF is no longer the version being cited? Would you try to download the published PDF and review that (and add it to the same item in Zotero?), or would you just assume that it hasn't changed in meaningful ways from the final preprint version (which may or may not be the version you have)?

    2) Once you update to the published metadata, do you care about the arXiv ID?
  • My personal answers:

    (1) Optimally, I would want the PDF updated if I had *not* yet added any comments to it (i.e., if it was just the raw PDF from the arXiv), but I would keep the old PDF if it *did* have comments on it. However, this is a weak preference, and I'd be perfectly happy to just have the publication info updated and leave the PDF untouched. (I would be unhappy if it *always* updated the PDF, thereby losing any comments I had made.)

    (2) I would prefer to have the arXiv ID still noted somewhere. (And in fact, it would be nice to have an automated solution for fetching arXiv IDs for all the published things in my library.) But this is again a weak preference, and I would happily use a preprint->publication updater tool even if it meant erasing the arXiv information.

    Thanks for taking these preferences under consideration!
  • FWIW from my point of view as an indexer _both_ versions should be retained even if I added no comments to the original. Consider if you cite the arXiv version in a manuscript that is accepted and published. Later in a second manuscript that you write (after the arXiv article has been published in a journal) you will want to cite the journal version. I believe in citing precisely what I have read. I will also want to be able to refer back to the pdf of the originally cited arXiv version. I accept that some may disagree with me. Know also that for a few arXiv "articles" I have retained both the early and the revised versions if they are more than superficially different.
  • edited December 27, 2020
    Thanks for taking our comments seriously! My answers

    1) For me the really important thing is updating the medatada. If Zotero was also able to download the new pdf it would be nice, but it's by no means a fundamental feature.
    In principle I agree with the issues that DWL-SDCA is mentioning, but in my experience most of the times the differences between different arXiv versions and the published version are minimal. Personally I think I can handle the few cases where I need to cite a specific arXiv version "by hand" (creating a separate un-merged parent item, I guess)

    2) I agree with jessriedel, it's very useful to retain the arXiv ID.
