Translators: three universal requests for your consideration
Here is an example with each of the problems:
10.5194/nhess-2019-423
1)
It would be really helpful for translators to recognize when publishers preface the DOI string with "http..." and strip it before the first "10".
2)
Several publishers either use the full word (in the above case, English) or (as is the case with PubMed and some publishers) the three character abbreviation.
3)
Please strip out the word "abstract" or "summary" as the first word in the article abstract.
It may be that it isn't feasible to test for these problems universally but as translators are revised please consider making these corrections to the Language and DOI fields.
If needed or helpful I will append posts to this thread and include DOIs or URLs to identify specific publishers with these metadata problems.
10.5194/nhess-2019-423
1)
It would be really helpful for translators to recognize when publishers preface the DOI string with "http..." and strip it before the first "10".
2)
Several publishers either use the full word (in the above case, English) or (as is the case with PubMed and some publishers) the three character abbreviation.
3)
Please strip out the word "abstract" or "summary" as the first word in the article abstract.
It may be that it isn't feasible to test for these problems universally but as translators are revised please consider making these corrections to the Language and DOI fields.
If needed or helpful I will append posts to this thread and include DOIs or URLs to identify specific publishers with these metadata problems.
2) Isn't really a problem, as the language codes don't get used for anything. It's not even clear to me how this will/should look if the field ever gets standardized. Could well be that it actually displays the human readable language, possibly even localized, and the ISO codes are stored under the hood. For now, we won't touch this, though.
3) Happy to do for individual translators where this happens (as you know we've done this in the past for several), but I'd probably stay away from this in the Embedded Metadata one (i.e. the one used for the DOI you post) given that there are too many moving parts. I'd be open to reconsidering this, though, if others favor stripping out at least Abstract (Summary seems definitely tricky as that can also be a sub-category of the abstract)
3) I'm really only concerned if the first word of the abstract is "Abstract" or "Summary" (or in my example journal article, ignoring the html formatting tags before the first word). Again, my own system does this when parsing the Zotero MODS export. This isn't essential to me but it might be helpful for others.
re 1) I haven't set up my parser to strip the preliminary web stuff so this would be really useful to me as I expect also to everyone else.
Thanks
edit typos and omitted word
Thanks