Updating publication information for arxiv pre-prints

Hi,

In my field, most papers are shared as pre-prints on arxiv first, and then published officially in a journal or conference.
I use arxiv to find new papers and save them to zotero.
However, when it comes to writing papers, I would like to cite them with their 'official publication' information.
Is there a way to automatically update the zotero arxiv pre-prints items with their official bib information?

Thanks
  • If the item already has a DOI on ArXiV when you import, Zotero will save the final published information.

    If not, there isn’t an automated way currently to update metadata. Your fastest workflow would probably be to follow the URL for the item to ArXiV, see if it has a published DOI. If it does, save the time again and then merge the duplicates.
  • Is it still the case that there is no automated way to update preprints (like from the arXiv) that have been published and acquired a DOI after being added to Zotero? It's too bad, because most preprint servers have good machine-readable metadata for this.
  • No updating of items of any kind has been implemented, correct. Upstream metadata is indeed not a significant blocker.
  • I'm curious what the blocker is? Just additional complexity?
  • I think it's a mix of a) lots of other stuff going on, b) getting the UI/UX for something that actually does a majority of the updating that people want isn't easy and related to that c) deciding on when which databases get queried is not at all trivial.
  • Got it, thanks very much.
  • I also work with a lot of arXiv preprints and from time to time (mostly when I actually get to read the paper) I want to check if the paper has been published or updated.
    Back when I was using Mendeley my workflow was to just update the metadata based on the arXiv identifier which would then fetch the DOI (assuming the authors updated their arXiv entry of course).
    The workflow explained by bwiernik is not too bad, but I kind of miss my old one, where I never had to leave the reference manager.
    Is there any chance that this feature will ever be implemented?
  • Yes, it’s being actively worked on.

    You could try to see if the DOI Manager plugin will retrieve the DOI for you. Then, you can copy the DOI, use the magic wand tool to make a new copy from the DOI, and merge the two items.
  • A couple questions for the people who want this:

    1) If you save a preprint from arXiv and later update it to add publication info, do you care that the saved PDF is no longer the version being cited? Would you try to download the published PDF and review that (and add it to the same item in Zotero?), or would you just assume that it hasn't changed in meaningful ways from the final preprint version (which may or may not be the version you have)?

    2) Once you update to the published metadata, do you care about the arXiv ID?
  • My personal answers:

    (1) Optimally, I would want the PDF updated if I had *not* yet added any comments to it (i.e., if it was just the raw PDF from the arXiv), but I would keep the old PDF if it *did* have comments on it. However, this is a weak preference, and I'd be perfectly happy to just have the publication info updated and leave the PDF untouched. (I would be unhappy if it *always* updated the PDF, thereby losing any comments I had made.)

    (2) I would prefer to have the arXiv ID still noted somewhere. (And in fact, it would be nice to have an automated solution for fetching arXiv IDs for all the published things in my library.) But this is again a weak preference, and I would happily use a preprint->publication updater tool even if it meant erasing the arXiv information.

    Thanks for taking these preferences under consideration!
  • FWIW from my point of view as an indexer _both_ versions should be retained even if I added no comments to the original. Consider if you cite the arXiv version in a manuscript that is accepted and published. Later in a second manuscript that you write (after the arXiv article has been published in a journal) you will want to cite the journal version. I believe in citing precisely what I have read. I will also want to be able to refer back to the pdf of the originally cited arXiv version. I accept that some may disagree with me. Know also that for a few arXiv "articles" I have retained both the early and the revised versions if they are more than superficially different.
  • edited December 27, 2020
    Thanks for taking our comments seriously! My answers

    1) For me the really important thing is updating the medatada. If Zotero was also able to download the new pdf it would be nice, but it's by no means a fundamental feature.
    In principle I agree with the issues that DWL-SDCA is mentioning, but in my experience most of the times the differences between different arXiv versions and the published version are minimal. Personally I think I can handle the few cases where I need to cite a specific arXiv version "by hand" (creating a separate un-merged parent item, I guess)

    2) I agree with jessriedel, it's very useful to retain the arXiv ID.
  • @dstillman

    Answering the questions from 2 years ago, not sure if still relevant (but I still would like this feature!)

    1) As fran_alba mentioned, my main wish is for the updating of metadata. Downloading the new pdf would be nice, but not fundamental, and I would NOT use this feature at all if it meant deleting my old pdf. I would indeed assume the pdf itself has not changed in meaningful ways (this is typically the case in my experience)


    2) It would be nice to retain the arxiv ID, but not fundamental.

    I think my answers generally line up with those of jessriedel
  • Sorry for bumping this old thread. I have created a plugin for Zotero 7 (beta) for easier updating publication information for arXiv papers: https://github.com/AllanChain/zotero-arxiv-workflow

    It's currently in alpha stage and is by no means stable. It will search arxiv.org for the "Related DOI" field, which may be updated if the paper got published. If a published version is found, a new item will be created automatically and the published PDF will be downloaded. Then the preprint item and the newly created journal item will be merged.

    The current feature meets my needs:
    1) I would try to download the published PDF and review that, and keep the arXiv PDF unchanged because it may have some annotations.
    2) I don't care about the arXiv ID
  • There is also another plugin that already implements similar functionality: linter for zotero (https://github.com/northword/zotero-format-metadata).

    For preprints, it first gets whether there is an updated doi from arxiv, and if so, gets the metadata based on the doi. If arxiv doesn't have it, it searches from semantic scholar to see if there is an official publication.

    Note: He doesn't care about PDFs, only metadata.
  • 1) I usually never put notes directly on the PDF, so for me,just replacing with the final version would be best. Since there are different opinions, wouldn't it be possible to have an option when we update preprints to choose what to do?

    By the way, there are websites such as NASA/ADS, for which the URL stays the same (unlike the DOI), so wouldn't it be possible to update it automatically from this?
    Such a feature would really be convenient instead of having to go again to the same website, import it, and merge it.

    Thank you for your efforts!
Sign In or Register to comment.