New Plugin: Zotero DOI Manager

bwiernik · November 29, 2017

shortDOIs (http://shortdoi.org/) are an official way to shorten DOI names. For example, this shortDOI:
10/aabbe

can be used instead of this long DOI:
10.1002/(SICI)1097-0258(19980815/30)17:15/16<1661::AID-SIM968>3.0.CO;2-2

Using shortDOIs can make your references list easier to read and can shorten it (helpful if your are running against page limits).

The Zotero shortDOI Lookup plugin can automatically look up shortDOIs for journal articles stored in Zotero and replace their DOI field with the shortDOI. It can also replace shortDOIs with the original DOIs if needed. Finally, you can also use the plugin simply to check if any of the DOIs stored in Zotero database are invalid.

The plugin is available from here:
https://github.com/bwiernik/zotero-shortdoi/releases/latest

nickbart · November 30, 2017

Fantastic, thank you.

One suggestion, though: Could you add a menu item for converting shortDOIs back to long DOIs? This would make things easier for those who might need to switch between the two forms, e.g., due to publishers’ requirements. It would also make it easier to decide in favour of not having the plugin insert “Long DOI: …” into the Extra field, reducing clutter.

bwiernik · November 30, 2017

My first recommendation would be to fight the publisher on that front. shortDOIs are as official as the Long DOI, and publishers should not insist you don’t use them.

Unfortunately, without Long DOI being saved, there isn’t a way to look it up—the shortdoi.org API doesn’t provide a way to reverse lookup Long DOIs from shortDOIs.

The most the plugin could do (without resolving the DOI link and then grabbing the Long DOI from the article webpage using Zotero’s translator framework, but that’s beyond my skill level) would be to automate the step of swapping the Long DOI back from Extra.

I actually chose the format for Extra to make it relatively easy to switch based on publishers’ misunderstanding of how the DOI works. You can just delete the word Long before DOI: to activate citeproc-js’s Extra parsing function and override the value stored in the DOI field.

nickbart · November 30, 2017

Ok, but if publishers do insist, you won’t really have a choice. (All hypothetical so far, but I’ve seen similar things happening too frequently not to worry …)

Automating the step of swapping the Long DOI back from Extra would be useful, in particular since your recommendation of deleting the word Long before DOI: does not work (except for the various dates [EDIT: and type], citeproc-js does not let variables in the Extra field override those in regular fields).

FWIW, I have been using the following “shortdoi2doi” script successfully for quite some time now, maybe you could adopt something similar:

#!/bin/sh
curl -D - -s https://doi.org/$1 | grep Location |cut -f 4- -d "/"

Rintze · November 30, 2017

(it would be good to know if there are publishers that recommend or even require the use of short DOIs over the long format. If there are, that would be an argument for adding a dedicated "doi-short" metadata field in CSL, and, as an extension of that, adding a dedicated field in Zotero)

bwiernik · November 30, 2017

That's a good point, I imagine that publishers with a strong preference for short articles/reference lists (Science, Springer-Nature, Annual Reviews) might have a preference for short DOIs.

DWL-SDCA · November 30, 2017

There is a larger question: Why do some publishers insist on absurdly long DOIs in the first place? I believe the prize for journals goes to PLoS Currents articles [for example:
10.1371/currents.outbreaks.8ed218c079fbded60c505f025ed45f67]

adamsmith · November 30, 2017

FWIW
curl -LH "Accept: application/vnd.citationstyles.csl+json" https://doi.org/10/aabbe gives you back CSL JSON that includes the long DOI.

The reason for overly long DOIs are the local provisions to prevent collision of automatically generated DOIs, in the case of PLOS maybe overdoing it by using a uuid. When they started using them (and PLOS was a pretty early adopter) the idea was also not that DOIs would be displayed as much as they are. The hope was for them to be mainly between machines. Including today
CrossRef
or
Article
are among the CrossRef sanctioned ways of displaying DOIs.

bwiernik · November 30, 2017

Okay, I tried to add a Get Long DOI function to the plugin. Unfortunately, javascript's XMLHttpRequest() function silently follows redirects, so there is no way to grab the initial redirect header and extract the long DOI.

Anyone know a way to get the initial redirect location or, alternatively, call a curl command from within javascript?

(I know this is a bit technical for the forums and I can move to zotero-dev if you like.)

Edit: Oh, now I get what @adamsmith meant. Thanks!

bwiernik · December 1, 2017

@nickbart Okay, there is now an option to Get Long DOIs, and the preference to store the long DOI in Extra is removed. Thanks for the help everyone.

nickbart · December 1, 2017

Great!

Just another observation:

“Verify DOIs” tries to connect to publishers’ websites (rather than to doi.org only) – this seems to be a time-consuming extra step I don’t think is really necessary.

Also – but that’s most likely a Zotero core rather than plugin issue – “Add item(s) by identifier” does not seem to work with shortDOIs. In order not to treat shortDOIs as second-class citizens it’d be great of course if this could be made to work.

Finally, I’m experiencing a strange issue when clicking on the “DOI” label (=“Go to this item online”). It works for long DOIs, but with a valid shortDOI like 10/bc799g I get, in Firefox:

“https://doi.org/10/bc799g” in the address field, but

“DOI Not Found
10/10/bc799g
This DOI cannot be found in the DOI System.
…”

on the page itself (note the “10/10/”!).

When pasting “https://doi.org/10/bc799g” into the address field myself, this works as expected (i.e., forwards to http://ieeexplore.ieee.org/document/771073/).

Any ideas what could be going on here?

bwiernik · December 1, 2017

Regarding the Add by Identify and DOI field clicking, doi.org's resolver is currently broken when it gets a shortDOI with the / character encoded as %2F. They are aware of the issue but no ETA on when it will be fixed.

I have a PR waiting for review and merging to workaround the doi.org problem in Zotero: https://github.com/zotero/zotero/pull/1364

Regarding the Verify DOIs, I think I found a way different method in doi.org's API to do that more smoothly.

nickbart · December 1, 2017

Ok, good to know. (Indeed there was a “%2F” involved, though it doesn’t show up in my post above in the view [as opposed to edit] mode.)

bwiernik · December 1, 2017

New version of the plugin out with streamlined longDOI and checkDOI functions that only use the DOI API

nickbart · December 2, 2017

The most recent version, Zotero DOI Manager 1.1.1, seems to show a lack of visible response or feedback in some cases:

valid long DOI, e.g., 10.1109/5.771073:
- Get shortDOIs → shortDOI appears in < 1 sec
- Verify and clean DOIs → no visible response or feedback

invalid long DOI, e.g., 10.1009/0000:
- Get shortDOIs → no visible response or feedback
- Verify and clean DOIs → no visible response or feedback

valid short DOI, e.g., 10/bc799g:
- Get long DOIs → long DOI appears in < 1 sec
- Verify and clean DOIs → no visible response or feedback

invalid short DOI, e.g., 10/bbbbbb:
- Get long DOIs → no visible response or feedback
- Verify and clean DOIs → no visible response or feedback

BTW, what is “clean” supposed to do?

“Automatically get shortDOIs on item import” does not seem to work either, neither via a translator, nor via “Add item(s) by identifier”.

karnesky · December 2, 2017

BTW, what is “clean” supposed to do?

Strip leading/trailing junk.

bwiernik · December 3, 2017

@nickbart Thanks for the testing.

Version 1.1.2 is out now--it fixes the bug with broken notification about invalid DOIs and also adds a progress window when lookup is running.

Like Karnesky says, "and clean" refers to the function pulling out the DOI if there is extra text in the field (e.g., a doi.org prefix, "doi: ", a publisher website URL, etc.). It uses Zotero's built in cleanDOI function. This is a good idea because CSL styles expect the field to contain a clean DOI without any prefix text (it also produces consistent behavior with the long and short DOI functions, which also just store the bare DOI names).

The one thing that trips cleanDOI up is if a publisher adds additional information at the end of the DOI (e.g., "/abstract.x" or "/fulltext"). Annoyingly, DOIs can include the forward slash / character, so we can't just treat "/" as a DOI boundary like whitespace characters. So, publisher URLS accidentally stored in the DOI field that have such suffixes will get marked as invalid.

nickbart · December 5, 2017

Big improvements, again: thank you. Just a few more very minor suggestions and observations:

An attempt to retrieve or check an invalid DOI sets the “_Invalid DOI” tag. Conversely, it would seem useful to clear that tag after any successful retrieval or check, and also if there is no DOI or shortDOI in the DOI field at all (e.g., because a user removed a DOI that was found to be invalid).

The following are purely cosmetic, it seems:

- “Get long DOI” with a long DOI in the DOI field, and “Get short DOI” with a short DOI in the DOI field, a popup appears “Long DOIs retrieved for 1 items” or “ShortDOIs retrieved for 1 items” (though nothing was actually retrieved here).
- When importing an item without a DOI, e.g., a book, there’s also a popup “shortDOIs retrieved for 0 items”.

Given that regular DOI fields are planned for all Zotero item types, it’s probably not worth bothering about DOI variables entered in the Extra field (“cheater syntax”), right?

As to cleaning, thanks for clarification. It testifies to the quality of Zotero’s translators that I’ve never actually seen a DOI that needed “cleaning” so far, so I wasn’t even aware of the problem.

bwiernik · December 5, 2017

1) removing the tag is planned next time I get a chance to work on the plugin

2) It does look up the DOIs to confirm they are valid, so it seemed appropriate to include them in the count in that case. When an item doesn’t have a DOI, it’s not included in the count for retrieved DOIs

3) Yeah, managing the DOI in Extra is more fiddly, so I wasn’t planning on putting much effort into supporting that considering Zotero 5.1 will add DOI fields broadly.

4) I’ve also only ever encountered messy DOIs when importing from other programs

bwiernik · December 5, 2017

Okay, removing the _Invalid DOI tag and better handling of the Progress Window are in the latest release.

Bruce Rusk · December 27, 2017

I'm trying the plugin out (version 1.2.0), and am finding that once a ShortDOI replaces a LongDOI (a) clicking on the link no longer opens the browser to the linked item and (b) the plugin reports all ShortDOIs as invalid and cannot convert them back to Long DOI (though pasting them in at shortdoi.org goes to the right place). Any idea why this might be?

bwiernik · December 28, 2017

(a) is a bug in the DOI.org resolver. There is an open pull request for Zotero to work around that bug.

I’ve not encountered (b) at all. Can you give a sample DOI/shortDOI pair?

Bruce Rusk · December 28, 2017

Here's a sample:
LongDOI: 10.1017/S1380203807002140 (Works, seen as valid)
ShortDoi: 10/djhg26 (plugin marks as invalid)

One wrinkle: I'm using Juris-M, not plain Zotero, so perhaps there's an incompatibility here.

nickbart · December 28, 2017

10/djhg26 (and 10.1017/S1380203807002140) work as expected for me (using plain Zotero).

bwiernik · December 28, 2017

Ah, I bet Juris-M does not have the updated function for cleaning DOIs that Zotero has. I will check that.

Bruce Rusk · December 28, 2017

That makes sense, we can check with @fbennett whether that is planned for Juris-M.

bwiernik · December 28, 2017

No need to check. I’ll make the pull request when I have the chance

nickbart · January 3, 2018

Zotero DOI Manager, if enabled, seems to trigger the removal of zero-width space chars from author fields in Zotero (previously reported at https://github.com/bwiernik/zotero-shortdoi/issues/2). Disabling Zotero DOI Manager removes this effect.

Note that in some rare cases (e.g. “Lorenzo de’ Medici”), only the insertion of a ZWS at the end of the first name field leads to correct formatting, so there may be a good reason why a ZWS is there (and should not be touched).

Is there anything like a “cleaning” function in Zotero that might have been be activated by Zotero DOI Manager?

bwiernik · January 3, 2018

Zotero DOI Manager only touches the DOI field, so I’m not sure how it could be messing with Creator fields at all. Does the problem occur if you disable the automatic shortDOI lookup preference?

Can you post BibTEX or Zotero RDF of an item with a ZWS to gist.github.com and link here so I can test?

nickbart · January 3, 2018

> Does the problem occur if you disable the automatic shortDOI lookup preference?

Yes. I also tried this with all other addons disabled (except for Zotero LibreOffice Integration), and the effect continues to occur when Zotero DOI Manager is enabled, and disappears when it is disabled.

Zotero RDF: https://gist.github.com/njbart/452ba994603ae151d25b1900c3199c17

bwiernik · January 3, 2018

When exactly are you seeing the ZWS being dropped? I can still copy it out of the author field after importing the item, editing and saving it, and running DOI Manager's short, long, and verify functions. So I'm not seeing the dropping behavior you are at all.

Can you please give the exact steps to reproduce the error behavior?

As an aside, I don't really understand why you are using ZWS in this context. It doesn't print as a space, so I don't see how it fulfills your need to have the space after [de'] be retained. I'm assuming that citeproc-js drops other whitespace characters, such as non-breaking spaces or thin spaces? This would seem like it should be a fix made in citeproc-js, rather than trying to hack something with an unusual whitespace character that I don't think is widely supported by various publishing systems.