Sentence casing improvement request

As it is written in the documentation, "it's not possible to reliably automate conversion to sentence case." That's true. However, I see three algorithmically identifiable cases when the letters should not be decapitalized:

1. Acronyms. If a word contains more then one capital letter, they should not be decapitalized. Letters also should not be decapitalized within sequences where each letter is followed by a period (like U.S. or E. in "E. coli").
2. Symbols of chemical elements. These 100 (or so) one- and two-letter symbols do not coincide with any English words and are easily algorithmically recognizable.
3. The period may mean the end of a sentence or the abbreviation. To the best of my knowledge, the abbreviations other than mentioned in item 1 are rarely found in titles. I would suggest to treat period (other than within A.B.C.D.-type acronyms) as the sentence end and not to decapitalize the first letter of the next word.

I think that these improvements to the sentence-casing algorithm should reduce the number of manual corrections by a factor of 10.
Of course, I realize that I cannot take into account everything. Thus, the above suggestions must be discussed. But I hope that, in one form or another, they will be implemented in Zotero code.
Sign In or Register to comment.