Rich Text in Titles

Tjowens · August 22, 2008

We have had several threads, and several discussions about rich text in titles, but I am hopeful that we can come to a straightforward solution for at least most prevalent use cases.

As previously discussed there are several known situations where italics in titles are important.

Genus/species names
Names of genes
Foreign words
Names of ships
Titles inside titles

It was also suggested that Smallcaps, Non-numeric subscript, and Non-numeric superscript are necessary inside titles.

Are there other kinds of text that we need inside titles, or for that matter any of the data fields? Does the mark up for these things actually change from style to style much, or could we simply allow some basic mark up like inside title fields?

Rintze · August 22, 2008

In the exact sciences often only part of a word has special mark up, so I don't know if that counts as a 'kind of text'. The uses of special markup that I'm aware of (note that the mark up of most of this stuff doesn't change from style to style):

Smallcaps are commonly used to indicate the structure of certain chemicals, e.g. the "L" in "L-malic acid".

Numeric superscripts are very common to indicate isotopes (e.g. "13" in 13C-analysis), while numeric subscripts are used to indicate the position of certain atoms within molecules (e.g. "2" in "C2-malate"). In addition to numeric sub and superscript, non-numeric subscript and superscript are often used in abbreviations/symbols (e.g. "vmax" with "max" in subscript, the "+" in superscript in the abbreviated compound name "NAD+").

This kind of markup is also very common in abstracts, but support for title markup is infinitely higher on my wishlist.

bdarcus · August 22, 2008

Does the mark up for these things actually change from style to style much, or could we simply allow some basic mark up like inside title fields?

As I've tried to communicate many times, I don't see markup in titles as distinct from other kinds of content markup, and so you need to do this in a forward-looking way. I don't think simply allowing b and i is that.

WRT to your question, I would expect that titles may in some cases be rendered as italic, but in other cases as normal, or bold, or small caps. I have no data on how common those variations are though.

Also, WRT to this notion of being forward-looking, anything you do needs to be compatible with import/export data formats. I gave an example or two in the thread you link to that I think covers both of these issues, though admit it would suggest UI support that is more complex than dumb bold and italic support.

noksagt · August 22, 2008

Smallcaps are commonly used to indicate the structure of certain chemicals

Also sometimes used for names of software.

Numeric superscripts are very common to indicate isotopes

Note that numeric subscripts are already possible (via unicode input).

WRT to your question, I would expect that titles may in some cases be rendered as italic, but in other cases as normal, or bold, or small caps. I have no data on how common those variations are though.

Underline as well. And sometimes the text style may be used as-is in all citation formats. In other cases, it might need to be changed in some formats. I've seen the same article (a review, errata, or some other article that references another work by title) have either italics or underline in a part of the title, depending on the referencing style.

dstillman · August 22, 2008

Just to clarify/summarize, one of the main questions here is if, for formatting titles, a fixed semantic list would be sufficient—in which case CSL would presumably need to be extended with the same list—or whether that would be overly limiting and a more general (and inherently less semantic) solution is required. Or, for that matter, whether we do the former with a fallback to the latter.

Smallcaps are commonly used to indicate the structure of certain chemicals, e.g. the "L" in "L-malic acid".

Would there be a generic class name for this sort of thing?

Also sometimes used for names of software.

What's an example of this?

noksagt · August 22, 2008

http://www.google.com/search?q=filetype%3Abib+\textsc

gives several examples of smallcaps in software titles and in chemical names.

bdarcus · August 22, 2008

Although, noksagt, those examples are all raw bibtex. It'd be nice to see finished references, or style guides.

noksagt · August 22, 2008

I don't know of style guides that mandate the use of smallcaps in software titles & I'm more accustomed to encountering the markup in the abstract or body of an article (where one is more likely to encounter software names).

One interesting example is:
http://dx.doi.org/10.1016/j.pmcj.2005.08.003
That actually mixes smallcaps and normal (title case) text for program names in both the article title and in the bibliography (where the well-used program "Lime" is always set in smallcaps, but additions/frontends/etc. are mixed, as in "Tiny_LIME_").

Rintze · August 23, 2008

The style guide of Applied and Environmental Microbiology discusses quite a few different uses for superscripts, subscripts, smallcaps and italics. The markup will be the same in the whole paper, so these rules are also used for titles. I do think however that already for this guide, using specific semantic classes would be overly complicated:

http://aem.asm.org/cgi/content/full/74/1/1#NOMENCLATURE

Just some excerpts:

"Wild-type characteristics can be designated with a superscript plus (Pol⁺), and, when necessary for clarity, negative superscripts (Pol–) can be used to designate mutant characteristics. Lowercase superscript letters may be used to further delineate phenotypes (e.g., Str^r for streptomycin resistance)."

"Wild-type alleles are indicated with a superscript plus (ara⁺ his⁺)."

"Subscripts may be used in two situations. Subscripts may be used to distinguish between genes (having the same name) from different organisms or strains; e.g., his_{E. coli} or his_K-12 for the his gene of E. coli or strain K-12, respectively, may be used to distinguish this gene from the his gene in another species or strain. An abbreviation may also be used if it is explained. Similarly, a subscript is also used to distinguish between genetic elements that have the same name. For example, the promoters of the gln operon can be designated gln_Ap1 and gln_Ap2."

"L-[methyl-¹⁴C]methionine"

So here superscripts are used for either designating characteristics, the presence of genes or for isotopes. Subscripts are used to indicate gene origin or to give more information about genetic elements (these extend beyond genes). Italics are also not only used for genes, but also for other genetic elements, and also within names of chemicals (I don't know the rules about the latter). The use of smallcaps seems limited here to indicate chiral information (so you could make a class "Chemistry > D/L chirality"). You would have to distinguish it from another way to mark up chirality, namely the R/S system, which uses italics: http://en.wikipedia.org/wiki/Chirality_(chemistry)#Naming_conventions

asplundj · September 24, 2008

Any news on this issue?

Rintze · March 13, 2009

Maybe this will be of use for some people struggling with the same issue as me: I've decided some time ago to use Zotero for my future papers, but the (current) lack of support for rich text in titles had me worried a bit (working in life sciences, over half of the papers in my library require rich text markup). To avoid having to correct all my bibliographies by hand, I've settled on HTML tags to indicate title markup upon item entry, so I got titles in my library like:

<sc>L</sc>-malic acid production using immobilized Saccharomyces cerevisiae

Once I've completed my manuscript, I use the find & replace function of Word to remove the tags and add the required formatting. The set of tags I have devised, together with the search phrase and shortcuts to get the right layout:

Italics, text
Search phrase: [\<](i[\>])(*)[\<]/\1
Replace by \2, use the shortcut ctrl+i

Bold, text
Search phrase: [\<](b[\>])(*)[\<]/\1
Replace by \2, use the shortcut ctrl+b

Superscript, 
Search phrase: [\<](sup[\>])(*)[\<]/\1
Replace by \2, use the shortcut ctrl+shift++

Subscript, 
Search phrase: [\<](sub[\>])(*)[\<]/\1
Replace by \2, use the shortcut ctrl++

Smallcaps, <sc></sc>
Search phrase: [\<](sc[\>])(*)[\<]/\1
Replace by \2, use the shortcut ctrl+shift+k

For instance, to convert all the text elements in the manuscript to 'text' (in italics), I open the find & replace window, enable the option "Use wildcards", paste the search phrase "[\<](i[\>])(*)[\<]/\1" in the find text box, and put "\2" in the replace text box. Then I press the shortcut ctrl+i while the cursor is still in the replace box to get italic output and select the "Replace All"-button. Rinse and repeat for all the different kinds of markup you have used, and you're done. The only thing I don't know if I'll be able to transfer this HTML-tag-based markup to any future rich text support implemented by Zotero, but at least it solves my problem in the meantime.

P.S. I found this documentation very useful in dealing with Word wildcards:
http://word.mvps.org/FAQs/General/UsingWildcards.htm

[edit: see also my later comment a few posts below for a Word macro, which automates these find & replace operations]

philgons · March 21, 2009

Thanks, Rintze. Very helpful. Although, I do hope that a better solution is forthcoming soon.

asplundj · March 22, 2009

And to make Rintze suggestions easier you can make a macro that makes all the replacements and assign a short cut key for that macro

Rintze · March 22, 2009

Macro available at:
https://docs.google.com/uc?export=download&id=0B4KgWUjfrk4_NGMwOGFjMTYtMWY4Mi00OTdlLTk5OWEtNjE4ZThjNWMzNmM5

I don't have a lot of experience writing Word macros, so the macro doesn't have a shortcut or button. Just select the macro from the macro menu and run it, and it will convert all the HTML-tag enclosed text to its specified formatting. The .dot file can probably be best installed in the Startup folder of Word, see:
http://www.zotero.org/support/windows_word_plugin_manual_installation_instructions

JonEP · June 26, 2009

Can anyone provide a bit of an update as to whether the issues described in this discussion are being dealt with? In another space (http://forums.zotero.org/discussion/7554/format-text-within-title-and-book-title/#Item_3), fbennet suggests that a solution is coming, that will allow flip-flopping of formatted and non-formatted text. Is this relatively far along?

Rintze's markup-and-macro solution is a good workaround for now, but not ideal given that many titles, etc will have to be-reentered when a fix is finally settled upon.

Thanks.

fbennett · June 26, 2009

jonEP,

At the citation processor level, the weather report says "pretty far along". There is now code in the processor to handle markup, which works on some sample data; but it's not quite robust, there are some things that it won't handle correctly, so my initial code needs to be rewriitten. This part will be done by the end of the summer. After that, deployment will depend on the work burden and priorities of Team Zotero.

pjaschke · August 24, 2009

Just started dabbling with Zotero recently but because I'm constantly needing gene names and species names in italics, until the journal title rich-formatting is sorted out I would never attempt to use it for a paper. Too bad.

Rintze · August 24, 2009

Rintze's markup-and-macro solution is a good workaround for now, but not ideal given that many titles, etc will have to be-reentered when a fix is finally settled upon.

The 'fix' in the upcoming updated CSL processor will be based on the same set of tags, so there won't be a need to re-enter data. Eventually Zotero might provide some elegant way to apply these tags, but initially they will have to be added by hand, just as with my work-around.

pjaschke · August 24, 2009

Since the titles of most journal articles are not formatted correctly in the Pubmed or ISI web of science database I usually have to go through the titles in Endnote anyway and add italics. That won't be a huge problem to go through and add the tags.

JonEP · August 28, 2009

The 'fix' in the upcoming updated CSL processor will be based on the same set of tags, so there won't be a need to re-enter data. Eventually Zotero might provide some elegant way to apply these tags, but initially they will have to be added by hand, just as with my work-around.

It would be ideal if, along the top row of the zotero interface there were an icon for a drop-down rich-text editor--the same features that are available in the "Note" entry window (ie., when you are in the note tab and click on "Add", you then have the ability to enter text in italics, bold, etc.). If this sort of interface were available for entering titles, secondary titles, short titles, etc., it would really help.

As a bit of an aside, I suppose it is obvious, but the use of typed tags and other programming code (as in creating block quotes in the Vanilla forum, for instance!) is second-nature to many of those involved in the zotero project/community, but it is quite offputting to average users. It is a bit like asking them to go back to Wordperfect, circa 1991 (do you remember the "reveal codes" panel, where users saw all of those [tab] and [indent] markers?). Well, it will be good to have a way to put italics in titles, in any event, and I'm sure that the clunk will fall out of the mix in due time...

bdarcus · August 28, 2009

But adding that interface to such a tight interface is, I don't think, such an obvious thing. There's just not much space.

As for your other comments, the conventions are much simpler than adding HTML tags. This is not to say I don't recognize it may not be the ideal solution for many, but it's still, as you say, better than nothing.

Rintze · August 29, 2009

But adding that interface to such a tight interface is, I don't think, such an obvious thing. There's just not much space.

Maybe shortcuts could be used? E.g. ctrl-i would italicize selected text in the metadata panel? Or perhaps a customized menu could show up if you right-clicked selected text?

What I personally would like to see is rendering of HTML tags (at least in the metadata-column) in Zotero, perhaps with a toggle to show the tags. Tags could be stripped from the fields displayed in the middle column. That would prevent items from being sorted by tags instead of by real field content.

the use of typed tags ... is quite offputting to average users.

Note that this isn't done out of malice or ignorance. It's important to acknowledge that the new CSL processor written by Frank Bennett and Zotero are separate projects. For the CSL processor, the decision was made to support a number of HTML-tags as means of indicating rich text formatting. How programs like Zotero offer support for applying those tags is entirely beyond the CSL processor's scope. It would of course be possible to synchronize development of the CSL processor and Zotero (i.e. not announcing or including support for rich text markup in the CSL processor until Zotero has introduced a nicer way to apply tags), but a) the CSL processor is likely to be used in other projects, so its deployment shouldn't be delayed for Zotero, and b) although a bit rough, rich text support in the proposed form would already be very valuable for many users (including myself). And early adopters might be able to generate valuable feedback, which would benefit the 'average users' in the long run.

komrade · August 29, 2009

Any input into the knowledge base article on this topic would be good. At the moment I've linked it back to Rintze's nice workaround on this page, but someone might be able to summarise nicely at the KB page.

JonEP · September 3, 2009

I like the idea of shortcuts like ctrl-i, as per Rintze's suggestion.

Another option would be a floating menu bar (like those found in Adobe photoshop, Acrobat, etc.)--that would allow the user to clog or unclog the interface as necessary. It could pop up only during the data-entry, or even perhaps only when you are entering data into a field that accommodates rich text.

Then again, while agreeing that the interface is a bit tight, one more icon along the top row, for turning on or turning off the rich-text entry tools, would not be so bad?

Geoffrey Kantaris · September 6, 2009

Surely there's a real problem with the rich-text solutions: all of these lock down a particular semantic item to a typographical style which is meant to vary according to the publisher's house rules/stylesheet. So, while I might want to signal a book title within an article as italics according to the Chicago style, it would have to be underlined in the MLA style (for example). E.g. "A New Reading of Moby Dick" in one style, and "A New Reading of Moby Dick" in another. With html-style or rich text markup, these transformations are not going to be possible. Similarly, there is the issue of italics inversion in a book title. If, in a particular style, an entire book title is italicized, then the italics need to be turned off for a book-title-within-the-title, thus: Critical Readings of Moby Dick by Herman Melville. Surely a much better solution, then, is to use the existing semantic markup for titles, but to allow nesting of some kind. Then, in time, the transformation stylesheets can be updated to take account of the nesting.

fbennett · September 6, 2009

Geoffrey,

Thanks for your thoughtful input. The solution that is currently coded into the new CSL processor addresses your second issue, relating to nesting. Double-quotes inside a title will become single quotes if themselves enclosed in double-quotes supplied by the style. Italics will become normal typeface if themselves enclosed in italics. Same for boldface. So that one is covered.

There was extensive discussion of semantic markup while the current implementation was in the cooker, and titles were the primary example of where it might be useful. An elaborate scheme was proposed involving a full semantic layer, and a clean separation of semantics and presentation. After looking at the shape of that plan, however, everyone involved in the discussion, including its original proposer (me), concluded that it was excessively complicated, and likely to cause more pain than it was worth.

The largest consumer of in-field markup, by volume, will be the life sciences, especially for gene and chemical names. Because the formatting conventions in this case are uniform across the entire field, there is no practical need for semantic markup in this important use case. We therefore decided that the solution should at least permit presentational markup.

Titles may be a case where semantic markup is required, but we're still not quite sure how strong the demand will be. In any case, though, if it is to be implemented, we'll need to have a more clear and concrete idea of the use cases than we do just now. Once in-field markup is out there and available, users who need a little more flexibility to fully cover their needs will provide that feedback, and we'll be able to build something that targets known immediate requirements. There's a lot we don't know about actual practice. Re the MLA rule you cite on underlining ... my understanding is that it is affirmatively discouraged in the current edition. Re style-driven formatting for inner titles generally, some journals apparently display titles in the formatting used in the original publication, treating the title as a literal formatted string ... but we don't yet know how common that practice is.

As the requirements are clarified, though, we'll be able to tune things to meet them. The main thing at the moment is to get the existing solution out there so we can get some feedback.

Geoffrey Kantaris · September 6, 2009

Thanks, fbennet, for the very full and helpful explanation. I think most of my points would be addressed by an implementation in Zotero of nesting for typographical markup. I'm also sure that underlining for titles is going the way of the dinosaur, and if future styles do require such transformations, then I guess they can be dealt with at the time.

zroutik · September 9, 2009

IMO, whatever the styling and markup of the title&abstract is going to be, it should allow for a simplified search: e.g. "H2O" should be able to find "H₂O". Cheers, M

andrasgpa · November 18, 2009

hello all,

I'm not sure how far off the update is. In the meantime I have found it convenient to adopt Rintze's solution.

With some editing of Zotero's macros, the HTMLtagconversion macro can be run automatically.

This is accomplished by opening zotero.dot, going to the visual basic editor, pasting in the HTMLtagconversion macro code, and adding the two lines to fnUpdate(). The code I show below starts at line 461, and the added lines are marked.

If (mResultArray(0) <> "!") Then
For i = 0 To (j - 1)
Call subInsertFormattedText(mMarks(mBibliographies(i)), (mResultArray(0)), bUpdateTemp)
**add** subSelect (mMarks(mBibliographies(i)))
**add** HTMLtagconversion
Next i

Andras

asplundj · November 19, 2009

I can't find those lines in zotero.dot. are you using winword integration 3.0a3?

andrasgpa · November 19, 2009

No, my version appears to be 1.0b3.

I took a quick look at the code in 3.0a3 and it is quite different. You can still paste in the HTMLtagconversion code in the ZoteroRefresh subroutine, but you will need to make sure that the bibliography is selected for the code to work. There is a lot of code in 1.0b3 that allows that to happen easily, and it all seems to be gone from the macro code in 3.0a3.

Anyhow, if you need to manually select the bibliography, then it's easy enough to assign a button to the HTMLtagconversion macro.