Abbreviation issues with diacritics

Hi,

I'm using the Word plugin with MEDLINE abbreviations. However, I noticed that the journal "Archiv für mikroskopische Anatomie" has been abbreviated to "Arch. Für Mikrosk. Anat." (note the surplus "Für").

This led me to a journey through the source code and I found this function:
https://github.com/zotero/zotero/blob/master/chrome/content/zotero/xpcom/cite.js#L482-L487

Unfortunately, that function doesn't properly normalize "für" to "fur". I tested this by adding a custom abbreviations.json to my zotero data folder and it only abbreviated the journal name properly when the key had "für".

Of course, having a custom abbreviations.json works for now. But is this the intended behavior or should the normalizeKey function also handle diacritics?

I found this post which may be helpful for normalizing keys: https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript/37511463

Thanks!

  • edited April 6, 2020
    Beyond your diacritics question, fur/für shouldn't be included in the abbreviation. I suspect that is what you meant by "surplus". Rarely, if ever, are prepositions included in journal title abbreviations -- Medline, LTWA, ISO, or otherwise.
  • Yes, exactly. Maybe I wasn't very clear. In abbreviations.json, there is a mapping to remove the preposition "fur". However, because the normalization did not convert "für" to "fur", the preposition remained. That's what I meant by surplus.

    So the question is perhaps better phrased as: should normalizeKey remove diacritics, or should there also be an entry to remove "für" in abbreviations.json?

  • So the question is perhaps better phrased as: should normalizeKey remove diacritics, or should there also be an entry to remove "für" in abbreviations.json?
    It would be the latter, but I will let @adamsmith comment on this issue.

  • That'd require a look into abbreviations.json -- if that already contains a multi-lingual list of prepositions with diacritics removed, then it'd make more sense to just have this handled with normalization.
  • edited April 6, 2020
    I had another look at abbreviations.json and it seems that many keys have diacritics, so normalization may indeed not be the best way to approach this. Would it be possible to add "für" into abbreviations.json?
  • yeah, so adding this makes sense -- I'm wondering though if we could at least get at least get a fairly comprehensive list of prepositions and conjunctions for German to remove so that we don't do a PR for every single one we discover.
  • @zuphilip might know if something like this exists?
  • German prepositions: https://wortwuchs.net/grammatik/praeposition/liste/
    German conjunctions: https://wortwuchs.net/grammatik/konjunktion/liste/

    I could put those together if it helps. Does it make sense to add all?
  • I come back to the issue with diacritics. The title "Bulletin épidémiologique hebdomadaire" must be entered as "Bulletin epidemiologique hebdomadaire" for the abbreviation to be normalized properly, i.e. as "Bull Epidemiol Hebd" and not "Bull Epidémiologique Hebd". Could this be corrected? Thank you for your help.
Sign In or Register to comment.