Translators: three universal requests for your consideration

DWL-SDCA · March 31, 2020

Here is an example with each of the problems:

10.5194/nhess-2019-423

1)
It would be really helpful for translators to recognize when publishers preface the DOI string with "http..." and strip it before the first "10".

2)
Several publishers either use the full word (in the above case, English) or (as is the case with PubMed and some publishers) the three character abbreviation.

3)
Please strip out the word "abstract" or "summary" as the first word in the article abstract.

It may be that it isn't feasible to test for these problems universally but as translators are revised please consider making these corrections to the Language and DOI fields.

If needed or helpful I will append posts to this thread and include DOIs or URLs to identify specific publishers with these metadata problems.

adamsmith · April 1, 2020

1) we can definitely do

2) Isn't really a problem, as the language codes don't get used for anything. It's not even clear to me how this will/should look if the field ever gets standardized. Could well be that it actually displays the human readable language, possibly even localized, and the ISO codes are stored under the hood. For now, we won't touch this, though.

3) Happy to do for individual translators where this happens (as you know we've done this in the past for several), but I'd probably stay away from this in the Embedded Metadata one (i.e. the one used for the DOI you post) given that there are too many moving parts. I'd be open to reconsidering this, though, if others favor stripping out at least Abstract (Summary seems definitely tricky as that can also be a sub-category of the abstract)

DWL-SDCA · April 1, 2020

2) Are you saying that "under the hood" a language field label is morphed into the 2 character ISO codes so that Zotero can correctly apply styles that have language-specific differences by reference item? If so the behind-the-scenes business is really keen. I misunderstood your and other Zotero experts past recommendations to take care when entering data in the language field because styles use that information when formatting things such as titles. [My own system has a parser that converts the Zotero MODS language field (language word or 639-2 abbreviation) to the 2-character ISO 639-1 standard upon import into my web database and I thought that might be useful to everyone. My PubMed XML parser converts the PubMed three-character ISO 639-2 to 639-1 two character codes .]

3) I'm really only concerned if the first word of the abstract is "Abstract" or "Summary" (or in my example journal article, ignoring the html formatting tags before the first word). Again, my own system does this when parsing the Zotero MODS export. This isn't essential to me but it might be helpful for others.

re 1) I haven't set up my parser to strip the preliminary web stuff so this would be really useful to me as I expect also to everyone else.

Thanks

edit typos and omitted word

bwiernik · April 1, 2020

2. No transformation is happening, but the only current use of the Language field is controlling whether text casing rules are applied or not. They are applied to any language field value starting with “en” (not case sensitive).

DWL-SDCA · April 2, 2020

Another example of a publisher that has metadata with the DOI in an unusual presentation: 10.5038/1911-9933.13.3.1673

Thanks