citatons and bibliography: preserving field codes when copying text and pasting in Fidus Writer
Hi, the title says it (mostly) all: I was wondering whether there is a method for preserving "active" (and not text-only) citations and references generated by Zotero in a text copied form a ODT file, so that the code Zotero inserts into the text could be preserved and recognized by other programs when pasted, in particular by Fidus Writer.
Thank you
Thank you
So in our case, we receive a HTML page from LibreOffice or Word that we then work with. Word or LibreOffice have already converted their document structure into an HTML page when we start dealing with it. The problem is that in this HTML page, last time I checked, I could find no semantic information related to Zotero citations. I don't know these two programs well enough to be able to say whether the Zotero plugin could do something about this and add semantic information to the HTML-output, or whether this is a change that would require changes in LibreOffice and Microsoft Word.
I was even thinking that possibly one could CSL citation style adding keywords, etc. but also that would require a lot from the user.
Any other suggestions on how to do this?
@dstillman Can we move this to a new thread?
@guydog is a Fidus Writer user who would like to move his texts from other word processors to Fidus Writer and therefore has started to investigate how that would be possible without going the costly route of writing a an ODT/DOCX converter. See also https://forum.fiduswriter.org/d/12-importing-text-keeping-zotero-references-in-fw/4
1. The RTF-scan style: https://www.zotero.org/styles?q=id:rtf-scan
2. ODF scan actually has an option to scan documents "to markers" which converts Zotero citations (regardless of whether they were originally created from ODF-scan markers) back into ODF-scan markers. That might works well and has the advantage that the ODF-scan syntax is very precise & comprehensive, but the disadvantage that it only works on ODT, not on DOCX (though it could be made to work on DOCX, we think, and we'd be delighted to take patches).
Indeed as @johanneswilm noted before, I am "just" a user considering the possibility to use FW with Zotero
@guydog You are very welcome to advance to become a FW developer if you desire to do that :). Thanks for initiating this conversation here anyway!
1. Guaranteed to recognize and correctly assign every citation with correct prefix/suffix/locator and
2. Actually re-link citations to the database. RTF Scan just converts them to correctly formatted citations in plain rich text.
(This comes at the disadvantage of more unwieldly markers and ODT only)
I wrote a very simple JavaScript tool a while back to do the CSL JSON extraction from .docx files that might be of interest. See http://rintze.zelle.me/ref-extractor/ and https://github.com/rmzelle/ref-extractor/wiki.
In Oslo there was such a situation in the 1880s (Anders Høilund 2015).
I get this HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title></title>
<meta name="generator" content="LibreOffice 5.1.6.2 (Linux)"/>
<style type="text/css">
@page { margin: 0.79in }
p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; orphans: 2; widows: 2 }
a:link { color: #0563c1 }
</style>
</head>
<body lang="en-US" link="#0563c1" dir="ltr">
<p style="margin-bottom: 0in; line-height: 100%"><a name="ZOTERO_BREF_zuL4Xx8Nxk5w"></a>
<font color="#1b1b1b"><font face="Times New Roman, serif"><span lang="en-US"><span style="background: #ffffff">In
Oslo there was such a situation in the 1880s</span></span></font></font><span style="font-variant: normal"><font color="#1b1b1b"><font face="Times New Roman, serif"><span lang="en-US"><span style="font-style: normal"><span style="text-decoration: none"><span style="font-weight: normal"><span style="background: #ffffff">
(Anders H</span></span></span></span></span></font></font></span><span style="font-variant: normal"><span style="font-style: normal"><span style="text-decoration: none"><span style="font-weight: normal">øilund
2015)</span></span></span></span>.</p>
</body>
</html>
So there is a little bit about Zotero in there, it's just enough to be helpful. When using ReferenceMarks, there is nothing.
The ODF scan sounds like it has no fields at all. Right now it sounds like the simplest path forward would be to create a dedicated style that outputs something like the ODF-scan for the first three parts and then instead of the reference in the fourth part, includes a JSON-string which incles all the CSL fields for that reference. That would still not be super-simple for users, but at least advanced users could probably get it to work. And it should be simpler for us than trying to create and maintain DOCX and ODT import filters.
(also, the logic for extracting the CSL JSON from .docx files is extremely simple as long as you can unzip the file and use an XML parser; .odt which is also XML based is probably not much more complicated, although I haven't gotten that to work yet)
Not being able to escape means one can not do it 100% reliable, but as long as one picks a sufficiently strange separator, it seems like it should not be impossible.
1. Extract CSL JSON from DOCX
2. Convert DOCX to HTML the way you do now
3. Match CSL JSON entries back to citations
I haven't tried, but that seems better than to hack together pseudo-JSON in a CSL style (which I promise is going to be really, really frustrating. Lack of escaping was just one example. You can also, for example, not do something simple like
{firstName: Adam, lastName: Smith} with CSL as it doesn't handle first and last names as separate variables. There are going to be more examples as you actually try to implement this)
1. Copy and paste (parts or the entire document). This is what we ask them to do now, and it works for everything from footnotes to formulas, etc. - except citations.
2. Upload an ODT/DOCX file. Providing this would be very costly for the above-mentioned reasons.
I cannot really see how a combination of the two would work. Even if we would do that, and we would obtain the CSL JSON through the upload and the contents through the paste, then how would we be able to find out which citation is where in the text? The alt+F9 thing only works in Word, not LibreOffice?
https://www.zotero.org/support/kb/word_field_codes
In terms of user experience, it would seem to me that an option to upload a docx or odt and have the program convert the whole thing — citations, text, and all — would be the best experience. Having to copy paste it at all feels like a workaround.
ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"PvkUfocC","properties":{"formattedCitation":"(Karcher and Steinberg 2013)","plainCitation":"(Karcher and Steinberg 2013)"},"citationItems":[{
followed by CSL JSON, so you could use the formatted citation to match.
I see the issue with uploading -- unfortunately can't really help with that. Presumably you can't rely on an existing parser like Pandoc? All I can really help with is to tell you what can be done from the Zotero side of things.
Well, we need to provide a specific paste handler for those two programs anyway. So that's work that doesn't go away no matter what. Additionally offering an upload-function would not give us anything extra -- except Zotero citations. The user would still need to enter the document and clean it up, because we cannot really be sure 100% what the user meant with everything. Zotero-citations are quite important for a lot of users, of course, so if we could find a sponsor for creating and maintaining such a filter (costs estimated at 30,000 Euros initially and then 24,000 per year after that), we would offer that. Unfortunately we are not in that position and it providing this would currently likely absorb all our development efforts if we were to do it ourselves.
Those fields sound interesting. Too bad it's only in Word and not LibreOffice.
Hmm... maybe we just need to conclude that it's not really viable at the moment. Thanks so much for all the input though everyone!