Feature request: translate hex encoding in doi urls so doi lookup doesn't fail

cpsyctc · August 29, 2021

I love the doi lookup function in Zotero: has saved me hours (and Z has saved me months of my life: thanks!) This is something I hit quite often now when copying a doi from a web page. It pastes into Z as the hex encoded version, e.g.
https://doi.org/10.1177%2F25152459211035109 and that causes Z to, not unreasonably, say there is no such doi. Quite, it is should be
https://doi.org/10.1177/25152459211035109 which is what I saw on the screen when I copied it. Would it be possible for Z to "de-hex" pasted magic wand lookups, or to try de-hexing them when at first they fail?

TIA,

Chris

dstillman · August 29, 2021

We can consider this, but I'd note that you're pasting a URL, not a DOI, so there shouldn't really be an expectation that this would work. If you select the DOI rather than copying the link, it will of course work.

Also, you always want to use the Save to Zotero button when you can. If you have a DOI link, you would normally just click it and save from the article page, not paste the DOI URL into Zotero. Add Item by Identifier shouldn't be the main way you get items into Zotero regardless, and certainly not with DOI URLs. So I'm not sure we want to do this.

cpsyctc · August 29, 2021

"If you select the DOI rather than copying the link, it will of course work."
I find that many of the ways I get sent journal listings only give the DOI in this way so it's not possible to select the DOI as it's hex encoded in the link. I guess that copying those links had seemed not that different from copying the DOIs and had mostly worked for me in recent years.

"Also, you always want to use the Save to Zotero button when you can." Why?! Very often I can only get as far a page about the paper as the paper is behind a paywall. Often I then store the reference and abstract so I can find it easily in Z later should I want to. I think that gives me a much cleaner library than saving the page to Z.

Tangentially: you say "you always want to" and "Add Item by Identifier shouldn't be the main way you get items into Zotero regardless, and certainly not with DOI URLs."
Why? Mostly it works really well for me and until I started to hit this issue with hex encoded URLs it was simply fantastic as I've said. So why deprecate it?

Is there a place that deals out the logic to deprecations and prescriptions like this? I have used Z since a 1.? version but then found the standalone version, as that appeared, more to my liking and the save to Z button isn't something I use a lot, I use it almost entirely for web pages and haven't used it for formal papers with DOIs for years. I do sometimes read the Z update summaries but they are (rightly) about the actual changes, not about changing recommended and denigrated usage. Perhaps I am missing something that does summarise changes in those things?

TIA, Chris

adamsmith · August 29, 2021

No one is talking about deprecating add by identifier, not sure where you get that. In the contrary, it's gotten a number of recent improvements.
Using the browser button has been the preferred way to get data into Zotero from day one. It gets full text, it's more likely to get abstracts and it allows us to compensate for site specific data issues.

For the Q at hand, the problem really is that the strings in DOIs aren't restricted, so a %is perfectly valid in a DOI. One thing we could consider, though, is that I believe the prefix, i.e. the first part after 10. is more tightly prescribed. If I'm right about that, we could just assume that any DOI wit a % before the first / is already URI escaped

bwiernik · August 29, 2021

Yes, I think that check will work. Publisher IDs in the DOI system are restricted to numeric I believe

DWL-SDCA · August 29, 2021

For me, this problem appears when the "save to Zotero button" for an _article_ is a folder and not a journal article icon. Sometimes the journal article itself is not among the choices although the article DOI is displayed on the screen. When carefully copying and pasting only the DOI from the webpage, the entire URL is copied -- even if the URL isn't fully displayed on the screen rendering of the DOI on the webpage. The 'percent-2-F' needs to be replaced with a "/" when I delete the http-part. (Other encoded characters after the initial slash seem to be okay to be left as-is.) For the longest time I assumed this was something I just had to live with and I fully blamed the publisher's web design.*

If the scripting power behind the magic wand could recognize the encoded slash or automatically remove the address-part that precedes the actual DOI (and if I was young enough) I'd do cartwheels to express my joy.

*Sometimes, with these websites, copying the article abstract is recognized by the webpage and promotional text for the journal and publisher is appended to the abstract text that I thought I copied. (The promotional stuff is not displayed anywhere on the screen.)

adamsmith · August 29, 2021

Yeah, so technically other characters are allowed in the prefix (in particular, a period is permitted), but at least for CrossRef there's not a single example (see https://www.crossref.org/blog/dois-and-matching-regular-expressions/ ) so I think if we treat

10\.\d+%2F.+ as a) the beginning of a valid DOI and b) an indicator that the DOI is already URI escaped, that'll work nicely. I agree that there are enough cases of URI encoded DOIs in the wild to make this quite useful.

dstillman · August 29, 2021

I find that many of the ways I get sent journal listings only give the DOI in this way so it's not possible to select the DOI as it's hex encoded in the link.

@cpsyctc: Give them what way? I'm talking about the difference between visible page text and underlying HTML code. You're saying you're frequently encountering cases where there's not a visible DOI — as part of a link or otherwise — and only some other text with a doi.org URL as the underlying link? And even then, again, you would just click the link and use the save buton. Zotero can save data from most paywalled pages just fine.

@DWL-SDCA: Can you provide an example of what you're describing?

dstillman · August 29, 2021

@cpsyctc: And what adamsmith said for the rest. I don't know where you got "deprecate", but it sounds like you have a misconception of what the save button does. It's always been the recommended way to save to Zotero, and the standalone version of Zotero didn't change that — that's the point of the Zotero Connector. As for "prescriptions", see the long documentation page I linked to.

DWL-SDCA · August 29, 2021

I apologize some of the problems I mentioned may have been resolved with new improved translators. Some of this may be a holdover from the days when translators were not as good as many are today. That said, I encountered the "folder problem" not finding the article's own doi on Friday. I developed my own way of working with certain publishers websites to get their articles into Zotero -- I and my team just turn into automatons when we encounter problems with translators. Emerald journals used to present a problem and until today I hadn't tried the newer translator -- which works perfectly (thanks). I also see that there is now a PKP Catalog systems translator that works well (again, thanks). Would it be possible to post a list of translators that have been updated -- similar to the log of changes in Zotero program updates?

I will encounter again each of the problems I mentioned in my earlier post and will make note of the details. The problem of journals not having a translator and needing to enter by hand + cut and paste is fairly common for obscure East Asian, Eastern European, and central African journals. Quite a few of these journal articles present something they claim is a DOI but I can find no service that will resolve them. Many times the publisher presents these as a PDF with no html version. Zotero will import the PDF file but can not identify metadata. Often these PDFs are protected and I cannot copy and paste from them.

@dstillman If there is a way to select/copy from Zotero items in the trash-bin (without restoring the binned record to my library) I could probably give examples in a few minutes. Even better would be if I could sort my binned articles to find records that are missing a PMID in the Extra field. That would make it easier to find the records of problem items I encountered in the past couple of days. As it is I'll scroll through the binned records sorted by publisher and see if that can trigger my memory.

From memory, this used to be a problem with SAE journals, now MOBILUS, journals but now I see that the Mobilus journal translator works with DOI. [Off topic: Unfortunately, the SAE is slow to apply for and post DOIs so articles such as https://saemobilus.sae.org/content/02-14-03-0023/ without an assigned DOI cannot be imported except as a webpage with essentially no metadata for the article at all.]

dstillman · August 29, 2021

This doesn't really have much to do with translators. This is about copying visible page text — whether or not it's a URL — vs. the underlying encoded DOI link.

DWL-SDCA · August 29, 2021

Yes, I know. My comments referred to several problems. I will provide links the next time i find a page without a translator where, when I copy the text of the normal-looking DOI (without visible html string), the full URL with html (and sometimes encoded stuff) is copied instead. I'll probably find one in a couple of days. This is not especially unusual in my work as I (and my team) bring 1000+ items into Zotero every day. Remember, I primarily use Zotero only as an importer, editor, and MODS converter/exporter and then delete the converted records.