Use semantic, non-presentational markup for HTML citations

The markup used in HTML citations is too presentational to my taste. I suggest to prune it both for general elegance and to allow easier customization of layout through CSS. This is what it looks like currently:

<div style="line-height:1.1em;margin-left:0.5in;text-indent:-0.5in;">
<p style="margin:0">Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <span style="font-style:italic;">Ethnomusicology</span> 32, no. 1:75-105.</div>

The actual citation text is embedded in a <p> which in turn is embedded in a <div>. Both elements make little sense from a structural point of view. Both elements are styled via the style attribute; this introduces an awful lot of redundancy in a list of citations, and makes it difficult to customize the layout via CSS.
Then, italicizing is done via the style attribute of a span element, again making little sense semantically and structurally. From a structural point of view, we may think of the italic text in a citation as 'emphasized'; if we have an HTML element for just that, why not use it? In major browsers, the <em> element is rendered as italic by default so that would just contribute to general elegance without losing the desired default behavior. If I want to style my <em> differently, I still have that possibility.

For the core citation, the following HTML should be enough from a structural point of view:

Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.

However, citations usually come in groups. Well, we happen to have list elements in HTML, so I'd say: use them. In sensible and lean markup, a list of references would look like this:

<ol>
<li>Agawu, Kofi. 1988. Music in the funeral traditions of the Akpafu. <em>Ethnomusicology</em> 32, no. 1:75-105.</li>
<li>Alexandre, Pierre. 1966. Préliminaire à une présentation des idéophones Bulu. In <em>Neue Afrikanische Studien, Hamburger Beiträge zur Afrika-Kunde</em>, 9-28, Hamburg: Deutsches Institut für Afrika-Forschung.</li>
<li>Boas, Franz, ed. 1927. <em>Festschrift Meinhof. Sprachwissenschaftliche und andere Studien</em>. Hamburg: L. Friederichsen & Co.</li>
</ol>


Thus, when dragging an individual citation into an edit window (or putting it on the clipboard), it would make sense to embed it in a <li> element. When exporting a group, it might be good to include the <ul> element as well (i.e. to enclose all individual list items within an <ul> element). One complication is that Zotero has no way to know if the wanted citation is the first or only one (in which case it needs to be embedded in an <ul>) or whether the user is adding a citation to an already existing list (in which case s/he would want just a <li> element, not the extraneous <ul>). A sensible solution to that would be to provide for a setting where the user can choose whether the list items are exported embedded in an <ul> or without it.

(For clarity, I have left out the COinS <span> in the above examples. That <span> would be included just before the closing </li> of the citation.)

In short, please give us structural and semantic markup in HTML citations. It works better, looks better, is easier to maintain and customize, and just makes plain sense.

[edit: changed ul to ol as per Bruce's suggestion)
  • I didn't code this, but I wouldn't be surprised if the presentational markup was used in part for compatibility reasons, since the same HTML is used to do rich-text copying to the clipboard in Windows, and it's possible the more semantic elements don't show up as well when pasted into apps like Word.

    But you're right that as actual HTML it leaves much to be desired, so we should be able to at least clean up that output. Ticket created.
  • edited February 8, 2008
    I just tried copy-pasting the proposed markup into Word from my blog (where I always do the references like this), and it works beautifully. <li> elements are converted to a list, and <em> becomes italic.
  • I'm something of a markup fascist myself and so in general agree with mark's desire to have cleaner output. Also, Dan, this is the same exact issue I've mentioned before about the lack of style information (e.g. tagging) in word processors.

    However:

    First,the specific markup mark proposes seems a bit off. The list is ordered, so it should be ol. Also, using em tags for titles is just as presentational as the existing approach. I'd say cite is somewhat more appropriate.

    [aside: I'm really NOT a fan of COins; I'd rather see some kind of embedded RDF/RDFa]

    Second, the bigger problem is that processing becomes more complicated. If you don't embed the styling information (say, for margin-handling), then you need some other way to apply it. So if you're dealing with, say, HTML output, you need to be able to set or update the relevant CSS definitions. For example, your style's reference list has numbers; fine, that's default rendering for ol. But my style has neither numbers, nor bullets, so the styling needs to be set to not display them. This all becomes more complicated to process if you're not using presentational styling.

    It is worth nothing, though, that at least Word (not sure about OOo actually) is fairly smart about importing CSS, and will map style-tagged content to its internal styles. So there may be room here to improve things fairly substantially.

    I'd look into if it's possible for Zotero to load both tagged HTML (say an em element with a class attribute) AND it's associated CSS definition.
  • I'd look into if it's possible for Zotero to load both tagged HTML (say an em element with a class attribute) AND it's associated CSS definition.
    My experience of exporting textile-with-additional-css-rules to html and then exporting to word suggests to me that word's css implementation is pretty poor.
  • One possibility is to add another Quick Copy option, "Use semantic markup" or the like, to go along with the new "Copy As HTML" option. It too would be a site-specific option, so if semantic markup on your blog was important to you, you'd enter your blog domain and enable that option, and Zotero would use semantic markup when you did a copy or drag with that site loaded. There'd probably need to be a separate keystroke for copying the requisite CSS rules, but you wouldn't have to use it each time if you had already embedded the rules for your bib style of choice into your site's default CSS.

    For everything else, and by default, Zotero would just use the existing presentational markup.
  • bdarcus: agreed, it is an ordered list. Also, I agree, the <em> case for italic text is borderline presentational. Cite might be better, though not perfectly semantic either. Perhaps a span with a sensible class attribute would be best. Although that removes the convenient default styling <em> and <cite> provide.
    If you don't embed the styling information (say, for margin-handling), then you need some other way to apply it. So if you're dealing with, say, HTML output, you need to be able to set or update the relevant CSS definitions. For example, your style's reference list has numbers; fine, that's default rendering for ol. But my style has neither numbers, nor bullets, so the styling needs to be set to not display them.
    But in this example, why does Zotero need to set the styling at all? If the point is that your style has neither numbers nor bullets, shouldn't you be worrying about that style rather than Zotero?

    The rationale for the markup I proposed above (changing ul into ol as per Bruce's suggestion) was that it is reasonably simple and elegant, while at the same time the default styling of browsers will be sufficient for those who do not know or do not care about CSS. That way, you avoid the complexities that come with letting Zotero export the CSS.

    The way I see it, Zotero shouldn't mess with the presentation at all; it should just provide sensible and lean markup. But I do recognize that not all users think about it this way, therefore I think Dan's idea to make this a separate option is a good one.
  • edited February 11, 2008
    mark:
    But in this example, why does Zotero need to set the styling at all? If the point is that your style has neither numbers nor bullets, shouldn't you be worrying about that style rather than Zotero?
    This is more about integration with editors, where when a user chooses some CSL style, they expect the processor to take care of all the details.

    The same logic doesn't necessarily hold when you're publishing stuff on the web, of course, where you can just attach your own CSS rules.

    Dan's solution is a good start, but it doesn't solve the problem of including semantics in the desktop application. There's a danger in trying to separate these use cases too much; I already really dislike that I can't set a style in my word-processing template and have that overrule whatever Zotero provides.

    BTW, I just saw this in my blog archive, about using CSS for formatting.
  • Dan's solution is a good start, but it doesn't solve the problem of including semantics in the desktop application. There's a danger in trying to separate these use cases too much; I already really dislike that I can't set a style in my word-processing template and have that overrule whatever Zotero provides.
    That is exactly why I raised the issue in the first place: the Zotero export function currently includes inline CSS (which is tag-specific). Since inline CSS is the most specific, it has priority over internal CSS (document-specific) and external CSS (the most flexible option) in the cascading model of CSS. This makes it impossible to override these tag-specific declarations and effectively gets rid of the advantages of using CSS in the first place.

    Re: your blog posting — yes, a lot of bibliographic formatting problems could be overcome by the power of CSS. The biggest problem here is that some of the most widely used browsers do not support such vital pseudo-selectors as :before and :after . This may change in the future, but until then, CSS can't be called upon to do the work that it would be so good at. It's a shame really.
  • edited September 7, 2009
    I'm bumping this old topic to ask what's the current status. HTML export still produces presentational markup. Biggest problem is that this includes not just a superfluous div, but that the styling is set using inline CSS, which is impossible to override — even though you should expect users to want to style their reference list.

    The more structurally sensible markup suggested in my starting post is interpreted alright by all browsers; moreover, the layout is preserved in rich text copying (at least when pasting into MS Word it is). Can we please get something like this at least as an option?
  • Just an update: in my python csl engine, I'm using an HTML + RDFa internal model, so the structured semantics always accompany the formatted content. I've not yet figured out how I'm going to deal with the formatting attributes and CSS hooks, but I'll probably just add a class to the root node, and do most of the actual formatting inline. Would be nice for someone to experiment with how, say, Word deals with different mixes of classes and inline formatting on import.
  • mark: citeproc-js will make an effort to separate styling information from markup. Here is a sample page that I produced for discussion purposes while working out the formatting of bibliographies -- the actual implementation differs in some small ways, but it shows the general idea. Some samples of actual output are available in the processor tests, in the "class" and "magic" sections.
Sign In or Register to comment.