Idea? Using CSL Citation styles to output biblio in LaTeX

I am new to Zotero. The program looks great, although it will need some developments to be really useful to me. I am a LaTeX/BibTeX user, but also a keen programmer, so I could help develop some stuff.

Not quite sure if I am in the right discussion here. That might fit better in the plugin discussions.

Anyway, here is my idea: Currently, if you want to use Zotero with LaTeX, you need to export your bibliography to a BibTeX file, and then you need to use BibTeX and one of the BibTeX citation styles (the so-called .BST files). The sad thing about that approach is that you don't take advantage of the all the nice CSL files developed for Zotero.

Could we instead do something like that: Write a program, let's call it "bibzot", that would replace BibTeX and that would use Zotero dB and CSL files instead.

Bibzot would start exactly like BibTeX, i.e., read the .AUX file generated from the LaTeX document, and from there extract the name of the citation style (the name of a CSL style instead of a BST file) and the citations we want (all the \cite{..} labels) (we will need to find a way to map the citation keys used in LaTeX to the corresponding Zotero entries somehow). From there, Bibzot will then depart from BibTeX. Instead of reading a BibTeX file and run it through a BibTeX style file, it would get the bibliographic entries from the Zotero dB, and format them through one of the standard CSL file.

From there, how do we go about converting that output to LaTeX coding? I don't quite know what CSL output actually look like, I haven't looked that yet, but if it is XML or HTML, then converting to LaTeX should be relatively straightforward. From there, we write that into the .BBL file that LaTeX wants. Et voila!

What I am wondering is, first, is that a silly idea? Am I trying to make something simple in a very complicated way? Are other people interested in that? In my view, it would have some advantages:

- We take advantage of all the CSL files that have been developed
- We don't have to re-export the Zotero dB to a BibTeX file each time we add something
- Different CSL files might use say, full journal titles rather than abbreviated titles. We would not need to export the Zotero dB in two different ways to get BibTeX doing that right (although we could get that running using macros for the journal names in the BibTeX export).
- BibTeX essentially didn't evolve in the lat 10 or 15 years. It will need some more powerful replacement one day (although it has proven remarkably robust in handling many complicated cases). Zotero with bibzot could be that one.

There are some issues obviously:

- How to map the LaTeX citation keys to the Zotero entries?
- Can we handle every case that way? I know that some Author-Year citation styles leads to quite cumbersome LaTeX code. Is that do-able?

As I say, I am not familiar at all with the Zotero code, and the CSL files syntax. Could the above proposal actually be implemented?

Thanks for any comments, thoughts, encouragements, or whatever help I may need to start implementing that idea.
  • There was a discussion similar to this on comp.text.tex a while back. pandoc is the closest to getting what you want right now (in that it can use CSL+MODS & can read/write TeX). But, that still isn't very close (it does only a single pass through your document (so doesn't have knowledge gained after running your document through LaTeX & creating an .aux file) & citeproc-hs still needs some work). So, I definitely agree that code would need to be developed & that it would be worthwhile to do what you want.
    (we will need to find a way to map the citation keys used in LaTeX to the corresponding Zotero entries somehow).
    You wouldn't necessarily need to start from the Zotero database; you could use one of the export formats that Zotero supports as your datasource. Whether or not you do this, you could just use the same scheme that Zotero is using when it generates identifiers to your records when it exports to formats that require/allow those (such as BibTeX).
    From there, how do we go about converting that output to LaTeX coding?
    Different implementations of citeproc have different output formats. One could either modify a citeproc implementation to output what you want (e.g. a BBL file or LaTeX) or they can take a format that is already supported (e.g. HTML) & translate it to LaTeX. The former is "better," as most export formats are lossier & conversions aren't that great. However, the latter is "easier" & there are numerous tools/libraries that will do this conversion.
    if it is HTML, then converting to LaTeX should be relatively straightforward. From there, we write that into the .BBL file that LaTeX wants.
    Yes, but note that many BBL files have semantic markup that would be impossible to add from a less semantically rich HTML snippet.
    - BibTeX essentially didn't evolve in the lat 10 or 15 years. It will need some more powerful replacement one day (although it has proven remarkably robust in handling many complicated cases). Zotero with bibzot could be that one.
    I agree, but would also be remiss in mentioning things like BibLaTeX & many other would-be replacements that at least have been previously developed/used, and dome of which are actually fairly popular now.
    - Can we handle every case that way? I know that some Author-Year citation styles leads to quite cumbersome LaTeX code. Is that do-able?
    I see no reason why you can't make a replacement for the 'bibtex' program (e.g. take an .aux, a database of references, and rules to format those references & end up with fairly complicated .bbl). Because CSL cannot yet represent page-dependent styles, I think all current CSL formats would work under this model.

    There are certainly "messier" styles that are not yet capable of being written in CSL that would require you to either have clever LaTeX macros (as BibLaTeX uses) or would otherwise require multiple iterations of LaTeXing/citation writing. But there's no point in worrying about that now.
    As I say, I am not familiar at all with the Zotero code, and the CSL files syntax. Could the above proposal actually be implemented?
    Given enough monkeys at typewriters, why not? The Zotero & citeproc-js codebases are fairly nice. But, as above, you don't necessarily have to start with them to get what you want.

    If you do start with them, they are nicely modularized & you can more or less ignore large swaths of code (XML parsing, etc.) and focus mostly on the output classes.
  • Yeah, what noksagt said. I' just urge reading through the comp.text.tex thread to understand more.

    http://groups.google.com/group/comp.text.tex/browse_thread/thread/62d240a7ee7b7cb8/fe0cf6b42b5299d9?#
  • This would be amazing. If it's feasible, it seems like a powerful and natural way to move forward from the complexity of Bibtex.
  • edited January 6, 2010
    This is probably something to approach with a good deal of caution. The direct source-text-to-typesetting design of TeX means that the system and any supporting utilities must function as a perfectly tuned monolith. On the positive side, the typesetter can extract extremely detailed and accurate contextual information from the initial LaTeX run over the document -- much better than can be expected (at reasonable cost) from word processing software. But the downside is that automatically generated output (including references) has to be 100% accurate. There are no do-overs, any errors will end up in the pages of the target publication.

    Zotero/CSL output is getting more accurate all the time, but there are still plenty of items that (quite reasonably) are handled by touching up the document after Zotero has done its thing with the citations. To do that kind of touch-up work on references with traditional BibTeX processing, you would be digging into (if memory serves me correctly) the *.bbl file, which is not really meant to be read or edited by humans. Biblatex (which seems to represent the favored approach among the LaTeX maintainers) seems designed to write direct from reference database to *.dvi, so post-processing editing is completely out of the question.

    Because of this need for total accuracy, I think you are unlikely to see much enthusiasm for CSL integration among the LaTeX maintainers. Certainly it would not be an attractive or rewarding task to undertake without a good advance base of support in the LaTeX community.

    (EDIT: see my retraction of these reservations down thread)
  • Might be possible to have some way for biblatex to ingest CSL files though.
  • Zotero/CSL output is getting more accurate all the time, but there are still plenty of items that (quite reasonably) are handled by touching up the document after Zotero has done its thing with the citations.
    For the styles likely to be used by (La)TeXnicians, there should be relatively little touch up required & any manual touch up should certainly not be considered to be "reasonable."
    To do that kind of touch-up work on references with traditional BibTeX processing, you would be digging into (if memory serves me correctly) the *.bbl file, which is not really meant to be read or edited by humans.
    I don't know why you say this. .BBL files are quite readable. It is fairly routine to copy and paste them into a single TeX file, as many preprint archives and journals don't make BibTeX available. And multiple documents about BibTeX point out that you may edit the bbl file prior to running another round of 'latex'.
    Biblatex (which seems to represent the favored approach among the LaTeX maintainers)
    Who do you mean by "LaTeX maintainers?" It is the most popular potential bibtex successor (and better than the alternatives in many ways), but (IIRC): it is still beta, less popular than bibtex, and is not included in TeXLive or other popular distributions.
    seems designed to write direct from reference database to *.dvi, so post-processing editing is completely out of the question.
    ? BibLaTeX work-flows still rely on bibtex (although there are potential future replacements, such as biber). A .bbl file is still produced and used.
    Because of this need for total accuracy, I think you are unlikely to see much enthusiasm for CSL integration among the LaTeX maintainers.
    If accuracy is not a goal of CSL, I would think many people we're trying to reach wouldn't be enthusiastic.
  • Yeah, I don't really get Frank's point. Any editing of CSL output is usually a consequence of the style, or the data; not CSL per se.
  • I'll be quiet now. :)

    PS: I don't want to get into a flame war about this, it's not that big a deal. If someone (another monkey at the typewriter, to borrow noksagt's phrase) undertakes tighter integration between CSL and LaTeX, that would certainly be a great thing. My point is only that CSL is at its LaTeX 2.09 stage of development, if you know what I mean.
  • Well it's good to see that at least some people are interested. I will have a good look at the problem and try to get at least something running (after the other zillion things I have to do unfortunately ...)
  • Tacking onto this thread here to say that I've more or less completely changed my mind about the attractions of a "bibzot" utility. My dark and foreboding comment above was written in the midst of some particularly difficult work in the citeproc-js implementation, and the horizon has brightened up considerably since.

    To anyone interested in pursuing such a project, the citeproc-js test suite and sources are on BitBucket, the processor manual is available online, and specific questions can be directed to the integrators' discussion list.
  • Unfortunately, I lack the knowledge to implement such a thing, but such a bibzot utility would definitely be something quite interesting.
Sign In or Register to comment.