Use semantic, non-presentational markup for HTML citations
The markup used in HTML citations is too presentational to my taste. I suggest to prune it both for general elegance and to allow easier customization of layout through CSS. This is what it looks like currently:
<div style="line-height:1.1em;margin-left:0.5in;text-indent:-0.5in;">
<p style="margin:0">Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <span style="font-style:italic;">Ethnomusicology</span> 32, no. 1:75-105.</div>
The actual citation text is embedded in a <p> which in turn is embedded in a <div>. Both elements make little sense from a structural point of view. Both elements are styled via the style attribute; this introduces an awful lot of redundancy in a list of citations, and makes it difficult to customize the layout via CSS.
Then, italicizing is done via the style attribute of a span element, again making little sense semantically and structurally. From a structural point of view, we may think of the italic text in a citation as 'emphasized'; if we have an HTML element for just that, why not use it? In major browsers, the <em> element is rendered as italic by default so that would just contribute to general elegance without losing the desired default behavior. If I want to style my <em> differently, I still have that possibility.
For the core citation, the following HTML should be enough from a structural point of view:
Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.
However, citations usually come in groups. Well, we happen to have list elements in HTML, so I'd say: use them. In sensible and lean markup, a list of references would look like this:
<ol>
<li>Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.</li>
<li>Alexandre, Pierre. 1966. Préliminaire à une présentation des idéophones Bulu. In <em>Neue Afrikanische Studien, Hamburger Beiträge zur Afrika-Kunde</em>, 9-28, Hamburg: Deutsches Institut für Afrika-Forschung.</li>
<li>Boas, Franz, ed. 1927. <em>Festschrift Meinhof. Sprachwissenschaftliche und andere Studien</em>. Hamburg: L. Friederichsen & Co.</li>
</ol>
Thus, when dragging an individual citation into an edit window (or putting it on the clipboard), it would make sense to embed it in a <li> element. When exporting a group, it might be good to include the <ul> element as well (i.e. to enclose all individual list items within an <ul> element). One complication is that Zotero has no way to know if the wanted citation is the first or only one (in which case it needs to be embedded in an <ul>) or whether the user is adding a citation to an already existing list (in which case s/he would want just a <li> element, not the extraneous <ul>). A sensible solution to that would be to provide for a setting where the user can choose whether the list items are exported embedded in an <ul> or without it.
(For clarity, I have left out the COinS <span> in the above examples. That <span> would be included just before the closing </li> of the citation.)
In short, please give us structural and semantic markup in HTML citations. It works better, looks better, is easier to maintain and customize, and just makes plain sense.
[edit: changed ul to ol as per Bruce's suggestion)
<div style="line-height:1.1em;margin-left:0.5in;text-indent:-0.5in;">
<p style="margin:0">Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <span style="font-style:italic;">Ethnomusicology</span> 32, no. 1:75-105.</div>
The actual citation text is embedded in a <p> which in turn is embedded in a <div>. Both elements make little sense from a structural point of view. Both elements are styled via the style attribute; this introduces an awful lot of redundancy in a list of citations, and makes it difficult to customize the layout via CSS.
Then, italicizing is done via the style attribute of a span element, again making little sense semantically and structurally. From a structural point of view, we may think of the italic text in a citation as 'emphasized'; if we have an HTML element for just that, why not use it? In major browsers, the <em> element is rendered as italic by default so that would just contribute to general elegance without losing the desired default behavior. If I want to style my <em> differently, I still have that possibility.
For the core citation, the following HTML should be enough from a structural point of view:
Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.
However, citations usually come in groups. Well, we happen to have list elements in HTML, so I'd say: use them. In sensible and lean markup, a list of references would look like this:
<ol>
<li>Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.</li>
<li>Alexandre, Pierre. 1966. Préliminaire à une présentation des idéophones Bulu. In <em>Neue Afrikanische Studien, Hamburger Beiträge zur Afrika-Kunde</em>, 9-28, Hamburg: Deutsches Institut für Afrika-Forschung.</li>
<li>Boas, Franz, ed. 1927. <em>Festschrift Meinhof. Sprachwissenschaftliche und andere Studien</em>. Hamburg: L. Friederichsen & Co.</li>
</ol>
Thus, when dragging an individual citation into an edit window (or putting it on the clipboard), it would make sense to embed it in a <li> element. When exporting a group, it might be good to include the <ul> element as well (i.e. to enclose all individual list items within an <ul> element). One complication is that Zotero has no way to know if the wanted citation is the first or only one (in which case it needs to be embedded in an <ul>) or whether the user is adding a citation to an already existing list (in which case s/he would want just a <li> element, not the extraneous <ul>). A sensible solution to that would be to provide for a setting where the user can choose whether the list items are exported embedded in an <ul> or without it.
(For clarity, I have left out the COinS <span> in the above examples. That <span> would be included just before the closing </li> of the citation.)
In short, please give us structural and semantic markup in HTML citations. It works better, looks better, is easier to maintain and customize, and just makes plain sense.
[edit: changed ul to ol as per Bruce's suggestion)
But you're right that as actual HTML it leaves much to be desired, so we should be able to at least clean up that output. Ticket created.
However:
First,the specific markup mark proposes seems a bit off. The list is ordered, so it should be ol. Also, using em tags for titles is just as presentational as the existing approach. I'd say cite is somewhat more appropriate.
[aside: I'm really NOT a fan of COins; I'd rather see some kind of embedded RDF/RDFa]
Second, the bigger problem is that processing becomes more complicated. If you don't embed the styling information (say, for margin-handling), then you need some other way to apply it. So if you're dealing with, say, HTML output, you need to be able to set or update the relevant CSS definitions. For example, your style's reference list has numbers; fine, that's default rendering for ol. But my style has neither numbers, nor bullets, so the styling needs to be set to not display them. This all becomes more complicated to process if you're not using presentational styling.
It is worth nothing, though, that at least Word (not sure about OOo actually) is fairly smart about importing CSS, and will map style-tagged content to its internal styles. So there may be room here to improve things fairly substantially.
I'd look into if it's possible for Zotero to load both tagged HTML (say an em element with a class attribute) AND it's associated CSS definition.
For everything else, and by default, Zotero would just use the existing presentational markup.
The rationale for the markup I proposed above (changing ul into ol as per Bruce's suggestion) was that it is reasonably simple and elegant, while at the same time the default styling of browsers will be sufficient for those who do not know or do not care about CSS. That way, you avoid the complexities that come with letting Zotero export the CSS.
The way I see it, Zotero shouldn't mess with the presentation at all; it should just provide sensible and lean markup. But I do recognize that not all users think about it this way, therefore I think Dan's idea to make this a separate option is a good one.
The same logic doesn't necessarily hold when you're publishing stuff on the web, of course, where you can just attach your own CSS rules.
Dan's solution is a good start, but it doesn't solve the problem of including semantics in the desktop application. There's a danger in trying to separate these use cases too much; I already really dislike that I can't set a style in my word-processing template and have that overrule whatever Zotero provides.
BTW, I just saw this in my blog archive, about using CSS for formatting.
Re: your blog posting — yes, a lot of bibliographic formatting problems could be overcome by the power of CSS. The biggest problem here is that some of the most widely used browsers do not support such vital pseudo-selectors as :before and :after . This may change in the future, but until then, CSS can't be called upon to do the work that it would be so good at. It's a shame really.
The more structurally sensible markup suggested in my starting post is interpreted alright by all browsers; moreover, the layout is preserved in rich text copying (at least when pasting into MS Word it is). Can we please get something like this at least as an option?