Output citation with literal html codes (e.g. italics) for webpage?

djross3 · January 2, 2022

Is there a convenient way to maintain the <i> etc. formatting within a style? I'm copying many citations to a webpage and it would take a lot of time to manually reformat everything. Even search/replace would work if there's some way to export some kind of marker in the text. Would this require a variant of the style? Can CSL force literal HTML? Or is there another convenient workaround?

(In my particular case, I think I primarily need only italics, but a few other things like superscript come up in a few item titles at least, so copying everything over would be preferred.)

adamsmith · January 2, 2022

Do you know that you can set quick copy to copy as html in the preferences?

djross3 · January 3, 2022

Ah, that's helpful, thank you.

But it exports with a lot of extra content following Zotero formatting. Is there some way to automatically strip that out if I'm just trying to copy the text of the reference itself (along with italics, links, etc.) to an HTML document, rather than as a <div>, etc.? If not, this will work, and I will just find a way to automatically strip that out.

dstillman · January 3, 2022

There's not really "extra" content. The parent csl-bib-body div is necessary for proper formatting (text indent, line spacing) in most styles, and the Z3988 span is COinS that makes the data available for reimporting back into Zotero and other tools.

References are part of a bibliography, and you wouldn't generally copy them individually — you copy the bibliography all at once, and Zotero applies proper formatting to the entire div (in addition to doing proper sorting).

adamsmith · January 3, 2022

(so just to make that explicit: no way to strip any of this, no)

djross3 · January 4, 2022

Thank you for the clarifications. That all makes sense now, and this will work with some adjustments.

Regarding the HTML content, I've looked into this some more, and here are some thoughts, including why I'd rather remove it:
1. Your point about COinS is important, but the data actually saved there is not very useful. Book editors, for example, are treated as "authors", and a lot of other information is lost. The metadata isn't high enough quality that I'd recommend others use it, so in my opinion it's not worth including. Adding better metadata to the webpage later is something I will consider because that does seem useful, but via some other method than COinS.
2. The HTML output uses explicit formatting, rather than allowing general formatting via CSS. It is also odd in its design, most obviously because it lacks paragraph elements (and from what I understand this is problematic for accessibility software like screen readers that expect p elements inside divs). In short, I think the purpose of Zotero is to generate content (references) to be included in a webpage, not to generate the formatting of that webpage.

dstillman · January 4, 2022

I'll let others comment on the COinS output.

For the HTML, Zotero generates exactly what it needs to generate to produce a correct bibliography according to the selected citation style that can be inserted into a larger document, and that implies using inline style. You can restyle with CSS from the containing document if you want to do something different, but it would be incorrect for it to not apply indents or spacing.

Zotero uses div elements for the same reason — because the goal is to produce a correctly formatted bibliography, and divs generally won't have browser-default or other styling interfering. divs are block-level elements. There's no requirement for p elements and no reason screen readers should have any problem with it — they just need to read the text within each block.

adamsmith · January 4, 2022

COinS is just the least bad option to embed item-level metadata with citations, or at least it was. It absolutely is a very limited format given its origins (which are openURL requests). Pretty sure rtf.editor just doesn't exist, which is why we're not using it.

Not sure if there's now a better alternative -- JSON-LD at the item level? RDFa?

dstillman · January 4, 2022

Yeah, JSON-LD might make sense, but only once Zotero supports reimporting it, and we still haven't figured out exactly how we want to support JSON-LD in a general way, including for multiple results. (Getting off-topic for this thread, but this comment has links to discussions about it where we can try to resolve this.)

djross3 · January 4, 2022

In general I understand and appreciate the intentions. For my purposes this works now (after stripping the extra formatting).

For the HTML, Zotero generates exactly what it needs to generate to produce a correct bibliography ... it would be incorrect for it to not apply indents or spacing.

In principle, HTML should distinguish between content and formatting (consider the recommended "em" and "strong" elements instead of "i" and "b" when used for emphasized/strong text, setting aside of course literal italics, etc., in a citation). It's true that the references should be displayed with indents, etc., but that's (1) something that should be done at the document/formatting level (in one way or another) rather than forced within the content, and (2) there's no particular reason that the spacing should be forced like that (different web designers might prefer more or less spacing, or for that matter more or less indent). Having a CSS class on each element would be enough to easily allow any web designer to customize the formatting.

For the HTML, Zotero generates exactly what it needs to generate to produce a correct bibliography according to the selected citation style ... You can restyle with CSS from the containing document if you want to do something different...

Is this controlled by the specific CSL style? Not something I'll try to tackle now, but could be useful later.

Zotero uses div elements for the same reason — because the goal is to produce a correctly formatted bibliography, and divs generally won't have browser-default or other styling interfering. divs are block-level elements. There's no requirement for p elements and no reason screen readers should have any problem with it — they just need to read the text within each block.

It's functional, but it's weird HTML from a (content-oriented) design perspective, and I have better organized CSS + HTML formatting with paragraphs for these references. Of course it can't be so easily automated in Zotero (at least not without extensive inline formatting), so I understand why it's not the default, but I do prefer to strip that formatting and use something more general in the document.

As for COinS and alternatives, the current version just doesn't seem close enough to useful to be worth including. I understand it was the best option at the time, but it's more misleading than clarifying, at least for many of my references (often book chapters, and given that Zotero records often have editors first if you start by entering a book by ISBN, that is particularly confusing when they're listed as first author in COinS). If there's some better way to do this later I'd be interested to hear about it, but that's probably beyond the scope of my current project, or at least the current stage of it.

Thanks for your replies.

dstillman · January 4, 2022

Yes, we understand the difference between content and formatting. Zotero is a general-purpose tool that produces HTML that someone can paste into an HTML document and get correct citations according to the style they selected without hiring a web developer, and it does it the only way that makes sense. If you want to post-process the HTML or restyle it for your purposes, there are classes on the elements to facilitate that.