change case automatically

2
  • Rudy - I thought the translators did this by now, but for some reason that's not true for that site - doesn't work for me either.
  • Adam, I think that we then found the root of the problem. Should I report it as a bug? Or do you think the posts here are enough? Rudy
  • I was going to fix this, but since the translator just calls the RIS translator, it's not quite trivial, although still quite doable. I don't really want to make such a change in the RIS translator, so a patch would have to walk through the captured authors and shift the case of all-caps ones.
  • I am thinking about a few issues, sorted by the order of potential implementation. Maybe some of the ideas could be useful.

    Thx
    Rudy

    1. There should be a "Transform Text" menu for author names.

    2. There should be a way to mark parts of the text to prevent case conversions. Just like in BibTeX. That would allow to include reliable case conversions to the CSL style files. From BibTeX docs: "To protect the capital letters in say Hele-Shaw you need to put curly braces around the capital letter, e.g. {H}ele-{S}haw."

    3. There should be an option in "Document Preferences" to select if I want case conversions, and for what. A similar one for the drag/drop export of bibliography.

    4. There should be a tool to detect possible inconsistencies in the case, that would offer to correct or protect the case using the format in point 2.

    5. Unrelated --- there should be a pane to preview the selected item(s) in real-time, like in BibDesk (See http://bibdesk.sourceforge.net/manual/BibDesk%20Help_30.html#SEC70 )

    6. Unrelated --- import/export for BibTeX files should use the fields Bdsk-File-1, Bdsk-File-2 that contain links to PDF and other files or so (BibDesk only). It's some binary format, but BibDesk is opensource...
  • 1. in general yes - although I'd really prefer that not to be necessary and having translators deal with this automatically - you've just been unlucky, I haven't seen an all caps author in my last several hundred imports using a whole bunch of different translators - so this is really the exception and should become less and less common.

    2. no, no - please don't start with these BibTex things in Zotero - making users put weird, non-standardized (to other software) stuff in their data doesn't seem like a good idea to me - _and_ it would mean users would have to manually edit entries in those cases - the ideal scenario is that this is hardly ever necessary.


    3. maybe - this already exists for titles (there is a hidden preference extensions.zotero.capitalizeTitles that should be able to do that - I don't know why it doesn't for the one in question) - authors I think have one standard way how they should import, so no option necessary there. For output, I really think this should remain in the style, so no option in the preference either, no.

    4. that seems very hard - finding "inconsistencies" in the use of cases would seem pretty much impossible to do automatically.

    5. that might be interesting - maybe add that to the "show editor" function in the plugin?

    6. It might be worthwhile to create an additional output format, but not all bibtex users use BibDesk so no, bibtex output should remain in bibtex. Same as for RIS output, though, Zotero should include file links for bibtex.
  • edited April 30, 2010
    2.a. If you do not add exceptions for special situations then you will never be able to have a reliable case conversion algorithm. Even artificial intelligence, or even a human assistant will make an error. And needs a way to record instructions like: "Dude, don't touch this thing! This is a word in CAPITALS!". Exceptions or training for exceptions is a must. I understand you don't like it, but there is no way around it. So, it is just a question of how it is implemented, or stored.

    2.b. They do have to edit manually EVERY title now. In my proposed solution they only manually manage exceptions, or manually train the algorithm. A huge time saver, IMHO.

    2.c. Not manually edit. See point 4. Manually only edit whatever they notice is wrong and is missed by the automatic tools. And "manually" means using a special GUI.

    2.d. I would hope there should be a way to extend the XML file definition to allow for such exceptions. Maybe tricky, but there should be a way to do it cleanly. Maybe outside of the field could be considered clean. I.e. no funny characters in the text. One simple way is to have two versions of a title that has exceptions. One for sentence and one for title case, automatically synchronized. Or references to parts of text in the text field. Again automatically synchronized. In both cases flagged as a warning if edited by some other software --- for example by comparing a time stamp of the record and of the exception. Yes, maybe tricky, but something like this should be possible.

    4.a. I should have said "assisted", not automatic. Just like a spell-checker. You NEVER let it correct spelling automatically. Just find strange things. For example, the following references are clearly wrong, and this will be easy to detect:

    Li, B.N., DONG, M.C. & Vai, M.I., 2010. On an automatic Delineator for arterial blood pressure waveforms. Biomedical Signal Processing and control, 5(1), 76-81.
    SEBER, G. & Lee, A., 1977. Linear Regression analysis, New York: Wiley.

    4.b. In the first example above, the tool will work like a SMART spell-checker and ask about the case. It will ask what to do about:
    i) DONG: if sentence case, convert and go to iv. Otherwise ask if all author names are to be capitalized. If so, capitalize and protect all names, and go to iv. Otherwise protect "DONG" only.
    ii) Li: should it be capital too? If so, convert and protect. If not, protect "i" or "Li" (ask user)
    iii) Vang: should it be capital too? If so, convert and protect. If not, protect "ang" or "Vang" (ask user)
    iv) Delineator: Protect capital "D" or protect "Delineator" or convert to "delineator" (ask user)
    v) control: Protect small "c" or small "control" or convert to "Control" (ask user)

    4.c. Notice that the tool works in a conditional manner --- depending on
    - the field type (author, title, journal) (see i, iv, and v)
    - previous answer (see i-iii)
    - the other content of the field (see iv, v)

    4.d. All such exceptions should be viewable and editable --- in a panel, for the selection in the library browser.

    The main point being that one can never make a completely automatic system, but a system that will assist the user and speed up the user's work. See the points 2.b and 4.d again. This is really serious. Editing everything manually is like a typewriter with changeable storage, but limited brains. Formatting of bibliography in Zotero is excellent, but management is not. Please take it as constructive comments.

    The comparison to a spell-checker really is appropriate. Just like a spell-checker does not usually detect that "electric filed" should read "electric field", this tool would also not be perfect. But it would be smart and it would remember. And IMHO really save time, which is the goal.

    Thanks for listening
    Rudy
  • edited April 30, 2010
    On point (2), the new processor will have the ability to do this, using rich text markup.
  • On preserving case, I think some of it could be automatically handled, and the rest can be done with a simple HTML micro-language behind the scenes, which in the Zotero UI would be invoked with some simple mechanism (like highlight --> contextual-menu --> "preserve case").
  • edited May 12, 2010
    Just want to add to 4, to clarify that detection of problems is actually very simple. A non-standard case can be defined by the implemented conversion algorithm. Standard title should be either in sentence or title case. So, a title that differs from both is non-standard.

    1. Convert the stored title to title case and compare the two.
    2. Convert the stored title to sentence case and compare the two.
    3. If the stored title differs from both above conversions, then it is non-standard.

    This allows detection of the words that are non-standard, so the management of exceptions can quickly focus on the problematic details.

    In case of author names, the procedure can be made similarly simple by comparing a single string with all author names at once...

    Anyway, I hope that someone may find these thoughts useful...

    Best regards to all
    Rudy
  • Apart from the coding complexity that would be involved, some users might find the automated queries in category 4 to be excessively intrusive, if implemented in Zotero proper. It might be best to think of implementing such functionality in a third-party support plugin designed for "cleaning" database entries.
  • This allows detection of the words that are non-standard, so the management of exceptions can quickly focus on the problematic details.
    What sort of cases do you see this algorithm catching? And does it only work if the raw data is entered British style (sentence case, rather than title)?
  • edited May 12, 2010
    fbennett:

    A. I like the rich text markup that you mentioned. I am looking forward to it. Thanks.

    B. The idea was for the tool in 4 to work in a similar fashion as a spellchecker. I.e. only when the user explicitly requests to check the case integrity for the items that are selected in the library. Not automated as in "bug people all the time". The only automatic feature could be to mark items and/or fields and/or words with non-standard case, and only those that were not resolved. Again similarly to a spellchecker.

    Third similarity to a spellchecker --- I personally feel that managing the case is a core function that should be included in the base Zotero system. That includes both 2 and 4. I am glad that at least the point 2 --- storing the case exceptions --- will be a core functionality, as you mentioned.

    Thx
    Rudy
  • edited May 12, 2010
    bdarcus: Excellent question, thanks. It's briefly touched in 4.c ("the other content of the field"). Here is how this might be done:

    i) First convert the whole title to both cases as described. If it differs from both title and sentence case, as defined by the conversion algorithm, then there is a problem. This is the easy part.

    ii) If there is a word whose case differs from both the sentence and title case, then that word definitely has a problem. This is also easy. Ask how to handle these words, and if there are no other problems, we are done.

    iii) Otherwise count how many remaining words differ from title case and how many from sentence case. If it is 50:50, then this is difficult. You would then need to ask the user to choose which is the case. Otherwise we can make a guess. As an example, assume that the title has 10 words, that 3 words differ from sentence case, and 5 different words differ from title case. Then sentence case is likely the desired stored format, and those 3 words are exceptions.

    Notes:
    a) There will be some words that are correct in both cases, so the total number of exceptions may not add up to the total number of words (e.g. "a", "the"...).
    b) There will be no remaining words that differ from both cases --- see (ii)


    iv) In (iii) we could always include a radio-button in the GUI to show what we think the desired case was. The user could change that with a single click. That single click would change all the suggested exceptions: If sentence case is checked, then we show 3 suggested exceptions. Click and change the preference to title case, then we instead show the other 5 suggested exceptions. So, the GUI would show all the suggested exceptions for that given title. The default choice would depend on the count of exceptions in both cases --- suggest the one with lower count of exceptions. Clicking the radio button would clear previously confirmed exception to avoid conflicts.

    v) If in (iii) the count is 50:50, then the default choice in (iv) could be skewed by the field. Maybe a journal title uses the title case more often than the article title....

    vi) For each exception, the GUI would allow to either confirm the exception, or to correct the spelling to remove the exception.

    Again, this is supposed to be "assisted", not automatic exception handling.


    Thx
    Rudy
  • @blazek: thanks for the explanation, but that's not what I asked. I was trying to figure out what case examples it would catch. My assumption is that if the answer to my second question is no, then it can only work for acronyms.
  • edited May 15, 2010
    bdarcus, I thought that I tried to answer both questions. Here are your questions:

    1. What sort of cases do you see this algorithm catching?
    2. And does it only work if the raw data is entered British style (sentence case, rather than title)?

    Answer 1: It shall catch inconsistencies. Much more than acronyms, but not all words that require exceptions.
    Answer 2: No, it shall work for both sentence and title case raw data. But it would be more effective with the sentence case --- see below.

    The goal is to speed-up exception management, but there are two different tasks:

    A. Find inconsistencies: "The title is not standard, is that intentional?" This part is what we are after.

    B. Find conversion algorithm shortcomings that do not manifest themselves in the raw data: This is harder and has to be done by hand. Any software development should be shared for both improving the conversion algorithm, and detecting exceptions. This falls into the algorithm training category, using e.g. Bayesian approach. Hard. Not addressed here.

    Ultimately we want to allow Zotero CLS files to do automated conversions of titles, which most styles require. Currently almost no CLS files convert titles automatically because the algorithm is not perfect and there is no exception management in Zotero. The proposed tool aims to speed-up exception management by detecting titles that are not consistent with the existing title conversion algorithm, no matter how smart or simplistic the algorithm is.

    The algorithm will handle much more than acronyms, but it will not be perfect. Some titles will need the user to detect exceptions by hand. Similarly as spellchecker does not usually detect that "electric filed" should read "electric field".

    But it should save time.

    Best of luck
    Rudy


    Details for Answer 2:
    It would catch problems with raw data in both sentence and title case. Acronyms that you mentioned fall under (ii) above, and are easy to handle. The more complicated scenarios fall under (iii) and (iv) above. See this old example from Apr 29:

    Li, B.N., Dong, M.C. & Vai, M.I., 2010. On an automatic Delineator for arterial blood pressure waveforms. Biomedical Signal Processing and control, 5(1), 76-81.

    a) The article title will be considered to be likely in sentence case with a typo or exception in "Delineator".

    b) The periodic title will be considered to be likely in title case with a typo or exception in "control".


    More Examples:

    You can see that the algorithm should detect INCONSISTENCIES for both sentence and title case in the raw data. Not just acronym detection. Consider a few more examples.

    Example 1:
    The tool will not detect a wrong conversion in the following title:
    Raw data in the database: The Heritage of the Smith Family.
    Automated conversion to title case: The Heritage of the Smith Family.
    Automated conversion to sentence case: The heritage of the smith family.

    The sentence case conversion is wrong, but the title in the database is consistent with the title case conversion, so the problem has to be detected by the user manually.

    Example 2:
    The tool will detect the following non-standard title:
    Title in the database: The heritage of the Smith family.
    Automated conversion to title case: The Heritage of the Smith Family.
    Automated conversion to sentence case: The heritage of the smith family.

    The sentence case conversion is wrong as before, but this time the problem will be detected since the raw data in the database is not consistent with any of the conversions. So "Smith" will be detected and protected by an exception. The result is that we will be able to fix the sentence case conversion.

    Example 3:
    The tool will detect the following non-standard title:
    Title in the database: Internal Representation of struct Variables
    Automated conversion to title case: Internal Representation of Struct Variables
    Automated conversion to sentence case: Internal representation of struct variables.

    The title case conversion is wrong. The problem will be detected since the raw data in the database is not consistent with any of the conversions. So "struct" will be detected and protected by an exception. The result is that we will be able to fix the title case conversion.
  • edited May 13, 2010
    In the wild there are a number of systems that store names and titles in all caps. In that case, the two-way conversion will turn up inconsistencies for most of the words in the string, which would make for lots of click selecting to get the entry into the DB. In deployment, you might want to pre-screen for this (i.e. for all caps + the presence of inconsistencies, to avoid false positives with entries in Chinese), and for such entries opt for a forced conversion to sentence case (for titles) or title case (for names), and prompt the user with an invitation to tidy up manually. The algorithm could then be run over the doctored string to pick up the bits that need to be protected.
  • edited May 15, 2010
    I have originally assumed that the conversion from the all-caps titles should be handled by the translator. I saw the process as three-stages:

    (I) Import. Conversion of cases done by translator, via internal Zotero algorithm.
    (II) Management. Detect inconsistencies and manage exception using the proposed tool (see 4 in Apr 30th posts)
    (III) Bibliography. A CLS file converts the cases using internal Zotero algorithm.

    My thinking was that in (I) and (III) the proposed tool would not get involved, and exceptions would be noted and handled manually. But from your suggestion I realize that the inconsistencies detection could be brought-in at an earlier stage --- during the import in (I), so that any original inconsistencies do not get lost in case conversions. Or even if the user decides to convert the case using the menu in Zotero. That is important, thanks.

    If I correctly understand your note, some translators may not convert cases at all, and we might end-up with a lot of caps-only titles. I think that your suggestion to treat the all-caps titles separately is very good. Asian languages are a completely different animal, I did not think about it yet. Worse yet, some European languages use different rules for the title case, so I worry about that also.

    Ultimately I would like to use the stored exceptions (either detected by the tool or entered manually) to improve both, the internal Zotero conversion algorithm, and the inconsistency detection tool. So, it's a good idea to plan for that...
  • 6 years ago someone asked this:
    "I'm new to Zotero and have just discovered the text transformation which is great. At my uni they are quite specific about always using title case but 3/4 of my references have come through in lowercase. Is there a way to bulk convert to title case or even better, do it automatically?"

    I'm not sure I've seen an answer. I just got this question from someone at my institution today. Is there a way to bulk convert titles to title case, or to do it automatically on import?
  • I guess the question is why you'd want to: the reason we recommend sentence case for all references is that citation styles can automate title case (cf. e.g. all the Chicago Manual ones). Or am I missing something?

    It's technically possible to automate this on import, but we wouldn't want to.
  • To expand on what adamsmith said:

    Ideally each title record should be entered or edited to be in sentence case. When you work on a paper or manuscript you can achieve title case by using the appropriate style. No additional editing (or at most very little editing) of the final document is required.

    Any reasonably good reference manager can convert sentence case titles to title case. No reference management software can reliably convert title case to sentence case. The conversion process cannot handle proper nouns, names, etc.

    While you may not require sentence case titles in your reference list now, it is almost certain that you will need that option in the future.

    Journal publishers (with a few exceptions) tend to publish their articles in title case. However, many bibliographic databases, Medline/PubMed for example convert titles to sentence case during the process of indexing. SafetyLit, the database I'm responsible for, does the same. That additional labor is worth the trouble to users -- every now and then someone writes to express their appreciation.
  • One small issue I've been having is using Title Case transform on titles containing roman numerals, for instance titles containing 'World War II' or in cases where a volume number is a part of the title.
  • In what situations are you using title case transform? The citations styles using title case get World War II right and otherwise you really shouldn't be transforming to title case. You want all of your titles in sentence case as DWL explains above.
  • Perhaps camfish was referring to the annoyance of having to correct titles in Zotero when using the Transform Case function to convert from title to sentence case? Not sure there would be a way to reliably exclude Roman numerals and similar from the conversion.
  • For those using a mac, you might use this free software from the Apple App Store:
    https://itunes.apple.com/us/app/wordservice/id899972312?mt=12
  • I too like Zotero a lot and use it a lot too. I work on a Mac so when I control click on the title it only give me:
    paste
    select all

    the others are greyed out and only include:
    cut
    copy
    delete

    In one of the comments above it said we could change the case, but I do not seem to get that option :-( How come ?
  • edited October 18, 2023
    By the way, now with Zotero 7 beta the Roman numeral issue mentioned above by @camfish has been fixed. The conversion to sentence case keeps acronyms properly in upper case and also abbreviations such as UK, U.S., LiDAR, NHTSA, WHO, CoVID, etc. My thanks to the developers who implemented this incredible time / effort saver.

    One minor thing: articles that begin with quote marks that would normally begin with a word with an upper space character now begin with a lower case character after conversion to sentence case. See:
    10.1080/1057610X.2020.1793457
    10.1080/1057610X.2020.1777714
  • @HeatherDawn: Right-click on the field in display mode, not while editing text.
  • Better BibTex is good for this for Title and Publication fields at least. Zotero has an option for Title Case or Sentence case, while BBTX adds another option that maintains (some) abbreviations, etc. (like DNA or words in deliberately mixed case).

    https://retorque.re/zotero-better-bibtex/
  • The behavior of stock Zotero and BBT Sentence case conversion is the same now I believe
  • It's very close, but not 100% the same.
Sign In or Register to comment.