Porting Word citations to latex with \cite{}
I was trying to convert a word document with citations created using the Zotero plug-in to a Latex format that I could just copy-paste across.
I opened an arbitrary .csl file just to see if I had a chance of doing it.
My question: is there a "name variable" in that style editor that would allow me to put the BBT "Citation key" field in the in-text citations (for reference, other examples are things like "author" and "title" and "editor" etc, there's a whole list here page 30: https://media.readthedocs.org/pdf/citation-style-language/1.0-20100321/citation-style-language.pdf)? I tried loads of things to see if I got lucky. I have a feeling this is impossible but thought I'd ask.
One possible solution I don't know how to implement is to batch convert the citation keys of BBT to an unused field that is common to most formats, e.g. short title or something like that - which I could then reference in the csl file. Any ideas?
Cheers.
I opened an arbitrary .csl file just to see if I had a chance of doing it.
My question: is there a "name variable" in that style editor that would allow me to put the BBT "Citation key" field in the in-text citations (for reference, other examples are things like "author" and "title" and "editor" etc, there's a whole list here page 30: https://media.readthedocs.org/pdf/citation-style-language/1.0-20100321/citation-style-language.pdf)? I tried loads of things to see if I got lucky. I have a feeling this is impossible but thought I'd ask.
One possible solution I don't know how to implement is to batch convert the citation keys of BBT to an unused field that is common to most formats, e.g. short title or something like that - which I could then reference in the csl file. Any ideas?
Cheers.
- BBT citekeys are not mapped to CSL
- There is no way to access them via Zotero's server API that I know of (though this is the part that I may be wrong about)
- There is no reasonably accessible local API for Zotero
If there is a way to access them via the server API, it would be possible to copy them over to a different field using pyzotero
https://github.com/urschrei/pyzotero
https://retorque.re/zotero-better-bibtex/configuration/#citeprocnotecitekey This is correct. BBT keys are not available via the API, nor can they be made available. BBT relies on a running Zotero to do its work BBT does have some APIs, but I'm not sure how you'd want to use it.
edit: ugh it would be so nice to have markdown editing here.
Given that BBT has citekeys, is it not then trivial to create a CSL that wraps the references in '\cite{' and '}'? I notice the 'BibTeX generic citation style' *almost* does it, but just spits out the citekey instead of wrapping it.
Am I right in thinking that's all it would take, or am I missing something? Does one even need BBT for this, given Zoteros built-in citekeys are sufficient for most purposes (the changes of a conflicting author/year are quite slim)?
You could indeed easily modify the existing style, yes. BBT actually offers an option linked on the description of the citeproc functinoality:
https://retorque.re/zotero-better-bibtex/installation/preferences/hidden-preferences/#citeprocnotecitekey
also not quite what you want, but likely easier to work with then the regular CSL one.
One further question -- if I receive a document from someone containing Zotero references to sources I don't have in my local library, is that a problem? (I'm trying to create a publishing workflow for a journal, and I won't obviously have all the sources that authors use).
In that case you'd have to just work with the CSL Bibtex style entirely, which is not without its flaws.
To get high-quality bibtex, you need the items in Zotero (from which you can then export them) but they'd have a different internal ID and wouldn't be linked to the citations in Word.
- extract data
- import into Zotero
- add citekeys
- replace existing citations with newly imported ones
- Apply bibtex citation style with BBT citekeys
It's the penultimate step that I think would be very tricky to do.
BTW wrt @laurence80386 when you say "putting together a workflow for a journal", do you mean you writing for a journal, or you working at a journal and managing the influx of documents? Because if the latter, at some volume "slim chances" become near-certainties.
Once the PDF is camera-ready the originals can be archived; the only place where a conflict is possible is within a single document, and if Smith has two publications in 2009 (i.e. two separate instances of \cite{smith_2009} then that can be detected at proofing (I guess?)
My main concern (but there may be others) is if I don't have the references in my local library -- do they travel with the Word document, in at least sufficient detail to be able to change CSL for the purposes of the workflow I've just described?
Not sure that's correct. An article could cite smith_2009 twice, or cite two separate smiths that both published in 2009, or one smith that published two articles in 2009.
the references do travel with the Word document (which is why the ref extractor can extract them) and you could, while extracting, save the item ID or key (I think one of them is in there) in the extra field somehow, replace the cite in the doc with \cite{key or item ID}, and then stitch that together somehow (whether that goes through Zotero or not is a different matter).As an author, I'd be pretty nervous that either I or someone not-I would have to check that all cites remain in order, that the required sentence-case-to-title-case translation has been done properly (not a trivial matter to automate, trust me), some people post-edit the in-doc cites to get parencite-equivalent output, CSL and bib(la)tex don't have a perfect mapping between fields which means compromises or even data loss during conversion...
Given what you describe, I'd start by simply testing out two different and easy-to-implement workflows:
1. Completely rely on CSL, including for the formatted bibtex. This will correctly match citekeys and bibtex, but the bibtex itself may have problems. Modifying the existing CSL style to include the cite command will indeed be trivial.
2. Rely on CSL (again, modified) for the cite command in text, the use the ref extractor to import into Zotero and export using Zotero's (or BBT's) bibtex, set to just include author_year as the citekey. This is more likely to mismatch citation and key, but will give you better/correct bibtex.
If either of them is good enough for you, you're set with minimal effort. If you need something that performs better, you can follow some of Emiliano's thoughts further -- I think with those, it may be possible to get to a 100% reliable solution (at least for all standard item types) but that'll require much more significant up-front scripting.
But maybe I am missing something...?
Unfortunately your first option (relying entirely on BibTeX) won't work because we're soliciting submissions from both computer science and the humanities and the latter simply won't be willing to learn how it works, unfortunately.
As to your second suggestion: Can I clarify the process you're suggesting:
(i) use a BibTeX CSL to format the citations in the document once received from the author (they will use ACM while writing; I will convert to BibTeX for publishing)
(ii) extract the citations from the document with the ref extractor to get a BibTeX file
(iii) copy the body text into LaTeX and create the relevant separate BibTeX file
(iv) link the two and compile
I'm not sure what you meant about including author_year as citekey. Can this be chosen in Zotero's bibtex? I notice the default is author_titlekeyword_year, as per the examples I gave above -- I presume, per the above, that I can stick with the default?
Is there any need for step 2 (extracing the refs and importing them in Zotero)? If I insert a bibiography at the end of the authored document, this will already be in BibTeX, which I can then copy and paste into the bibtext file in LaTeX. Is that right? (This is why I was asking if the authored Word document includes all the metadata about the inserted references, without which this step will of course not work -- but it sounds from what you're saying that this will not be a problem).
Strange request, but in order that I can test this, would one of you mind uploading a Word document with Zotero references somewhere on the web? This will let me emulate receiving an independently-authored document with references that have no connection with my own personal library, to see if the above process is feasible.
Thank you for your help so far!!
WRT extracting bibtex using the ref extractor -- I have looked at what I think is the source of the ref extractor, and I *think* it uses the bibtex-csl style (https://github.com/citation-style-language/styles/blob/master/bibtex.csl) to convert the CSL to bibtex, but if that is correct, it turns an item with title into and that, aside from the problem that it doesn't title-case the title, simply isn't valid bibtex. And I don't see how CSL could do a better job than this, given what I know about converting text to valid latex and what little I know about what CSL stylers can do to the text.
For conversion to valid latex, you're looking at either passing it through Zotero, or the closest competition I know, astrocite (even though astrocite doesn't pass my full test suite, and does not title-case).
That's not my experience with BibTeX CSL -- I get things like this: which looks pretty comprehensive, no...?
What you have in your example, though, is not Bibtex produced by the CSL style. It's Bibtex exported by Zotero (which is generated by fairly elaborate javascript. It's absolutely crucial that you understand that distinction because it is at the heart of what makes this difficult.
One of the problems of using the CSL style is, in fact, that you can't use short titles for the citekey, since CSL can't modify individual elements and you'd end up with spaces in the citekey, so you're stuck with just author(s)_year and you'd have to adjust the export (by either using BBT or modifying the javascript) accordingly. This also, of course, makes citekey issues much more likely.
But
{Two {Concepts} of {Rules}},
is still very likely wrong -- it's much more likely that it should be{Two Concepts of Rules},
. This may sound nit-picky (OK, it *is* nit-picky), and I don't mean to be belligerent, but if you're the accepting journal, and you're going to potentially change the submitters' bibliography... I don't know, that sounds tricky to me.I must in this admit that I have a pretty narrow focus in bibliography production that can not easily be accused of being pragmatic.
I also just discovered that Citation.js has the (rather hidden) option of using the original item ID as BibTeX citekey (https://github.com/larsgw/citation.js/issues/181), which should be easy to enable in Reference Extractor.
I haven't played with citation.js so I don't have an opinion on the conversion it offers. I do know that conversion between CSL (which is what can be extracted from a word doc with zotero references) and bibtex is a lot more complex than most people appreciate. I for one did not know what I was getting into when I started.
Other things to look out for are cross-references (specifically, but not exclusively, xrefs that point to a page number).