Non-alphanumeric characters in URL

edited June 13, 2021
This is a two-part question. The first is technical, about Zotero. The second is more about citation conventions, so I want to be clear in advance that I don't necessarily expect an answer, let alone a definitive one.

1. Is there any way to avoid having non-alphanumeric URL characters automatically saved in as a ludicrously long string of machine code?

Example:

Citation URL:
https://osakana.suisankai.or.jp/wp/wp-content/uploads/2020/12/ 1997%E5%B9%B4%E3%80%80%E5%85%A8%E5%9B%BD%E9%AD%9A%E9%A3%9F%E6%99%AE%E5%8F%8A%E6%8B%85%E5%BD%93%E8%80%85%E8%82%B2%E6%88%90%E6%A4%9C%E8%A8%8E%E4%BC%9A%E3%80%80%E5%A0%B1%E5%91%8A%E6%9B%B8.pdf
(Had to put a space after the ...12/ to avoid the URL rendering back to kanji here. It remains that long string of nonsense when output into Word.)

Actual URL:
https://osakana.suisankai.or.jp/wp/wp-content/uploads/2020/12/1997年 全国魚食普及担当者育成検討会 報告書.pdf

I was able to manually cut and paste the URL into Zotero and get it to output the original, kanji intact.

2. But that's actually not a great solution either, since most English-language publications/publishers won't print the non-alphanumeric characters.

Does anyone know what the conventions are regarding shortened URLs?

It seems like the easiest solution would be to use a URL like:
https://is.gd/3Z7zKc

But it also seems to me that this would be frowned upon. Anyone have any insights?

Thanks as always!
  • We'll look into saving URLs with non-ASCII characters in decoded form where possible.
    since most English-language publications/publishers won't print the non-alphanumeric characters
    That's not the case. Unicode has been around for over 30 years and is pretty universal these days.

    Short URLs are generally not a good idea for citation, as they simply create an additional dependency that may disappear before the original resource.
  • Technically, APA allows link shorteners (9.36) and Chicago Manual categorically disallows them (14.10). As usual, Chicago is right about this (in addition to permanence, URLs also frequently contain useful information).

    APA has a weird vacillating blogpost on this (that also contains some errors) where they kind of row back on the link shorteners and say they're mainly for student papers. https://apastyle.apa.org/blog/shortened-urls
  • Thanks to both of you for your comments. This was helpful as always.

    @dstillman said:
    That's not the case. Unicode has been around for over 30 years and is pretty universal these days.
    With respect, the question is not whether Unicode is available. As with all things, it's a question of implementation, which means house rules and conventions (some of which are unchanged from decades before the internet), ownership of fonts with full Unicode support, etc. Things are definitely easier for online publications, but for print there are still a lot of hurdles that I encounter on a semi-regular basis -- and would on a regular basis if I were more productive, I suppose.

    Thanks to @adamsmith for the Chicago reference. I had missed that. The impermanence defense is strange to me given that third-party links do, in fact, change―that's why we all know what 404s and web.archive.org are―but I will stick to the manual.

    I think the passage is worth quoting in part for posterity:
    14:10 Short forms for URLs. A very long URL―one that runs to as much as a line or more of text, especially if it contains a lot of punctuation or other syntax readable mainly by computers―can often be shortened simply by finding a better version of the link.... On the other hand, shortened versions of a URL pro­vided by third-party services (and intended primarily for use with social media) should never be used.
    While this is lovely in principle, it does not solve the fundamental issue of having URLs that are more than 3 lines long.

    Elsewhere, Chicago makes an effort to shorten nearly everything, including headers, footers, titles, captions, etc. URLs are an interesting exception, technical justifications notwithstanding.
  • I believe you that you've run into trouble over the years. I'm just objecting to the idea that, in 2021, "most" publishers "won't print the non-alphanumeric characters" — which would include mathematical and scientific symbols, Greek letters, etc. And it would be technical malpractice where that was the case. (Here we're also talking about a hypothetical situation where a URL contains Japanese characters but the title, author, publisher, and other fields don't.)

    Anyhow, this is not something I would worry about, and if a publisher had a problem with it, I would object.
    The impermanence defense is strange to me given that third-party links do, in fact, change―that's why we all know what 404s and web.archive.org are
    That's part of the point, though. Leaving aside that a shortened URL is a second dependency — that was my main point — the original URL is an identifier, so it provides value in the citation even if it goes offline, for example by being findable in the Wayback Machine. If a shortened URL goes offline, it's worthless, because the metadata pointing to the original URL is stored on a server that no longer exists. Using the original URL also puts the onus for preservation/redirection on the publisher, which is more appropriate than, say, some company that made a deal with the Grenadian government.
Sign In or Register to comment.