feature request: Acronym "Safe List"

PNDPermitting · August 15, 2022

I find the "Title Case" and "Sentence case" tools extremely useful for reports that lack citation metadata. However, my work involves acronym soup. It would be handy to be able to curate a list of acronyms that do not get converted by the case change.

While I'm wishing... I'd love to be able to use this on a few other fields than the title, including Authors, Abstract, Report Type, Place, and Institution

Thanks for considering!

bwiernik · August 15, 2022

@dstillman : @emilianoeheyns has implemented a sentence case conversion in BBT that is a little smarter than the one in Zotero native, namely it doesn’t change words with mixedCase or that are all-uppercase in an otherwise mixed case string. Would you be interested in a PR adding those improvements to Zotero itself?

dstillman · August 15, 2022

Sure, that'd be great.

emilianoeheyns · August 15, 2022

Yeah, gladly.

PNDPermitting · August 15, 2022

My most frequent use cases are:
1) Going from all caps to Title Case
2) Going from a Sentence case title to Title Case where not peer-reviewed publication (in which case I think the 'all-uppercase in a mixed string' would be helpful)
3) Going from a Title Case title to Sentence case (in which case the 'all-uppercase in a mixed string' would also be helpful)

I'm not sure I've run across mixedCase in a a string I wanted to convert, but I can see how it might be helpful. I suppose it probably wouldn't be much additional effort to add once you added the other mixed case scenario?

The most annoying are all of the government agencies I have to reference, though. The best way I can think of to handle those is either a fully-curated list or a default one I could add to.

Thanks!

PNDPermitting · August 15, 2022

Oh, and depending on how you manage the exclusion, the ability to list mixed-case acronyms would also be useful, like dB or Leq.

PNDPermitting · August 15, 2022

One other thing I just noticed - using the case change on tags like italics or bold changes some of them to uppercase, rendering them nonfunctional. It seems like an exclusion might be an easy way to address that?

emilianoeheyns · August 15, 2022

As tends to happen with BBT -- the sentence-caser has outgrown the humble few patches I made on the existing Zotero sentence-caser; it's not immense (100 loc), but it relies on xregexp. Two questions then:

given its size and the xregexp dependency it seems to be it'd be better to host it in zotero-utilities/utilities.js rather than inlining it in the itembox.xbl, agreed?
the sentence caser lives in the BBT bibtex parser (which is where it started -- the itembox menu entry was just easy reuse), which is a separate npm module which BBT imports. I'm OK with copying the code over to utilities.js, but it seems a shame to me that these two would be maintained separately. I'd be OK with splitting it off to a dependency-free npm module that'd bring in xregexp using dependency injection if that is an acceptable route, and it wouldn't really matter to me whether this module would be maintained by me or by Zotero if you prefer to have more control. What is the preference between these options?

edit: the dependency on xregexp can be removed by replacing things like [\p{Lu}] by [A-Z] but you lose the ability to recognize "words" that have accents etc. I don't want to lose that myself, but in the case of copy-over it'd still yield a dependency-free ~100 line sencentecaser.

emilianoeheyns · August 15, 2022

One other thing I just noticed - using the case change on tags like italics or bold changes some of them to uppercase, rendering them nonfunctional. It seems like an exclusion might be an easy way to address that?

The BBT sentence caser doesn't have this problem.