citatons and bibliography: preserving field codes when copying text and pasting in Fidus Writer

guidogorgoni · December 5, 2017

Hi, the title says it (mostly) all: I was wondering whether there is a method for preserving "active" (and not text-only) citations and references generated by Zotero in a text copied form a ODT file, so that the code Zotero inserts into the text could be preserved and recognized by other programs when pasted, in particular by Fidus Writer.
Thank you

bwiernik · December 5, 2017

The codes Zotero uses are specific to Word or LibreOffice. It’s not possible to preserve them into programs that don’t support Fields (Word) or ReferenceMarks (LibreOffice). If Fidus Writer can export to ODF, you can use the ODF Scan Plugin (http://zotero-odf-scan.github.io/zotero-odf-scan) to covert a Fidus Writer Document into a live Zotero LibreOffice Document for final formatting.

guidogorgoni · December 6, 2017

I was thinking something similar, thank you for your quick answer

johanneswilm · December 6, 2017

Thanks for that! The issue is that all we have of import is a paste-import, because that is a lot easier to maintain. So we don't directly deal with DOCX/ODT files.

So in our case, we receive a HTML page from LibreOffice or Word that we then work with. Word or LibreOffice have already converted their document structure into an HTML page when we start dealing with it. The problem is that in this HTML page, last time I checked, I could find no semantic information related to Zotero citations. I don't know these two programs well enough to be able to say whether the Zotero plugin could do something about this and add semantic information to the HTML-output, or whether this is a change that would require changes in LibreOffice and Microsoft Word.

I was even thinking that possibly one could CSL citation style adding keywords, etc. but also that would require a lot from the user.

Any other suggestions on how to do this?

bwiernik · December 6, 2017

@johanneswilm It sounds like you are doing something completely different here. Let’s take a step back. What exactly are you trying to do? Are you a journal? What exactly are you looking for from CSL?

@dstillman Can we move this to a new thread?

Rintze · December 6, 2017

@bwiernik, Johannes is a dev of Fidus Writer (https://www.fiduswriter.org/who-we-are/). @guydog is presumably one as well.

adamsmith · December 6, 2017

Zotero uses Standard Word Fields / LibreOffice Reference Marks (i.e. the same thing used for other dynamic fields such as "today's date") for citation info -- whether and how these are copied to the clipboard (or included in HTML) is a function of Word/LibreOffice on which Zotero has no influence, so you'd have to look on that end of things for a solution.

johanneswilm · December 6, 2017

@adamsmith That is unfortunate, but also what I suspected. A work-around would then be, I guess, to write a citation style that leaves codes in the text which the Fidus Writer paste handler then can find and turn into a citation again. But that would require users to both install this special citation style and switch to it before moving the text. That's a bit much to ask, I fear.
@guydog is a Fidus Writer user who would like to move his texts from other word processors to Fidus Writer and therefore has started to investigate how that would be possible without going the costly route of writing a an ODT/DOCX converter. See also https://forum.fiduswriter.org/d/12-importing-text-keeping-zotero-references-in-fw/4

adamsmith · December 6, 2017

Yes, I'd think the route of the dedicated citation style is the way to go. If someone is dedicated to moving their documents, that certainly seems better than nothing. Writing and ODT and a DOCX converter sounds painful.

johanneswilm · December 7, 2017

Ok, and there is not by chance already a style that outputs something similar to the codes you use in the zotero-odf-scan? I am asking because if that would be the case, we could likely cut down on time spent on maintaining the style by cooperating with whoever is doing something similar already.

adamsmith · December 7, 2017

Two ideas:
1. The RTF-scan style: https://www.zotero.org/styles?q=id:rtf-scan
2. ODF scan actually has an option to scan documents "to markers" which converts Zotero citations (regardless of whether they were originally created from ODF-scan markers) back into ODF-scan markers. That might works well and has the advantage that the ODF-scan syntax is very precise & comprehensive, but the disadvantage that it only works on ODT, not on DOCX (though it could be made to work on DOCX, we think, and we'd be delighted to take patches).

bwiernik · December 7, 2017

I briefly looked at DOCX a while back, and it would be certainly possible to make it compatible with ODF Scan in the same way as ODF is. Haven’t had time to work on it myself.

guidogorgoni · December 7, 2017

it would be great if that solution could be implemented.
Indeed as @johanneswilm noted before, I am "just" a user considering the possibility to use FW with Zotero

johanneswilm · December 7, 2017

@adamsmith That sounds very interesting. Contributing to an existing solution rather than starting from scratch would be preferable. The rtf-scan style looks interesting, but it seems like it only gives three fields (title, year, author last name). Is this on purpose? Our citation system is made specifically to be able to cover over BibLaTeX and CSL, so preferably we would take all available fields.

@guydog You are very welcome to advance to become a FW developer if you desire to do that :). Thanks for initiating this conversation here anyway!

adamsmith · December 7, 2017

RTF Scan is designed to be used with this: https://www.zotero.org/support/rtf_scan, i.e. (basically) re-linked to the data through a scan, similar to ODF Scan -- the main difference between ODF scan and RTF scan is that ODF scan, by using ODT and a more complex syntax including Zotero unique item keys is
1. Guaranteed to recognize and correctly assign every citation with correct prefix/suffix/locator and
2. Actually re-link citations to the database. RTF Scan just converts them to correctly formatted citations in plain rich text.

(This comes at the disadvantage of more unwieldly markers and ODT only)

Rintze · December 7, 2017

Our citation system is made specifically to be able to cover over BibLaTeX and CSL, so preferably we would take all available fields.

Just so you're aware, Mendeley and Zotero both embed the full CSL JSON of cited items in the citation field codes. While CSL JSON is a bit lossy, that doesn't matter that much if you have a CSL back end yourself, and you could extract this citation metadata and reformat the field codes to something Fidus can handle.

I wrote a very simple JavaScript tool a while back to do the CSL JSON extraction from .docx files that might be of interest. See http://rintze.zelle.me/ref-extractor/ and https://github.com/rmzelle/ref-extractor/wiki.

johanneswilm · December 7, 2017

@Rintze: Right, but for that we would need to read the docx/odt files rather than just the paste that comes from them, right? We are only reading the paste data. I just tried to see what I get if using Bookmarks. Copying this text from LibreOffice:

In Oslo there was such a situation in the 1880s (Anders Høilund 2015).

I get this HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title></title>
<meta name="generator" content="LibreOffice 5.1.6.2 (Linux)"/>
<style type="text/css">
@page { margin: 0.79in }
p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; orphans: 2; widows: 2 }
a:link { color: #0563c1 }
</style>
</head>
<body lang="en-US" link="#0563c1" dir="ltr">
<a name="ZOTERO_BREF_zuL4Xx8Nxk5w"></a>
In
Oslo there was such a situation in the 1880s
(Anders Høilund
2015).
</body>
</html>

So there is a little bit about Zotero in there, it's just enough to be helpful. When using ReferenceMarks, there is nothing.

The ODF scan sounds like it has no fields at all. Right now it sounds like the simplest path forward would be to create a dedicated style that outputs something like the ODF-scan for the first three parts and then instead of the reference in the fourth part, includes a JSON-string which incles all the CSL fields for that reference. That would still not be super-simple for users, but at least advanced users could probably get it to work. And it should be simpler for us than trying to create and maintain DOCX and ODT import filters.

adamsmith · December 7, 2017

That's going to be rough to do with CSL -- it's a language designed specifically to write human-readable citations, not a general coding language, so I don't see you producing reliably valid JSON with it (e.g. you can't escape protected characters)

bwiernik · December 7, 2017

I wonder if it would be best to ask users to make the field codes in their documents visible, then copy and paste with the visible Zotero codes?

Rintze · December 7, 2017

Yes, CSL style output is a horrible metadata exchange format.

(also, the logic for extracting the CSL JSON from .docx files is extremely simple as long as you can unzip the file and use an XML parser; .odt which is also XML based is probably not much more complicated, although I haven't gotten that to work yet)

johanneswilm · December 7, 2017

The parsing of the CSL JSON in the DOCX/ODT file may not be difficult, but it will also mean we need to parse everything else in those, including things like formulas in Microsoft's own formula format, etc. . As I mentioned elsewhere, we have had a student who spent half a year on creating a DOCX-filter, and even that wasn't really usable. So that's why I don 't see that as a viable solution unless one has the financing to put a developer on it for several months. And then after that for at least a day/week to maintain both filters. We don't have such resources, so that's not a viable solution for us.

Not being able to escape means one can not do it 100% reliable, but as long as one picks a sufficiently strange separator, it seems like it should not be impossible.

johanneswilm · December 7, 2017

@bwiernik How do I make field codes visible?

adamsmith · December 7, 2017

But the JSON in DOCX contains the plain text citation in addition to the CSL JSON metadata, so you could do something like
1. Extract CSL JSON from DOCX
2. Convert DOCX to HTML the way you do now
3. Match CSL JSON entries back to citations

I haven't tried, but that seems better than to hack together pseudo-JSON in a CSL style (which I promise is going to be really, really frustrating. Lack of escaping was just one example. You can also, for example, not do something simple like
{firstName: Adam, lastName: Smith} with CSL as it doesn't handle first and last names as separate variables. There are going to be more examples as you actually try to implement this)

adamsmith · December 7, 2017

Field codes can be shown using alt+F9 in Word (alt+FN+F9 on Mac)

Rintze · December 7, 2017

The parsing of the CSL JSON in the DOCX/ODT file may not be difficult, but it will also mean we need to parse everything else in those, including things like formulas in Microsoft's own formula format, etc.

(I initially use a customized version of the Mammoth .docx to HTML converter (https://github.com/mwilliamson/mammoth.js) to isolate the CSL JSON, but currently my ref-extractor just unzips the .docx, extract all fields with the DOM parser, and identifies Zotero/Mendeley fields by checking for a certain field prefix, which is less than 20 lines total: https://github.com/rmzelle/ref-extractor/blob/fcf64fcdd61528ccb72c9a92fe5af4f730e0ac40/libraries/ref-extractor.js#L13)

johanneswilm · December 7, 2017

@adamsmith: I can see possible ways that are explainable to the user on how to get their text from LibreOffice/Word to Fidus Writer:

1. Copy and paste (parts or the entire document). This is what we ask them to do now, and it works for everything from footnotes to formulas, etc. - except citations.

2. Upload an ODT/DOCX file. Providing this would be very costly for the above-mentioned reasons.

I cannot really see how a combination of the two would work. Even if we would do that, and we would obtain the CSL JSON through the upload and the contents through the paste, then how would we be able to find out which citation is where in the text? The alt+F9 thing only works in Word, not LibreOffice?

bwiernik · December 7, 2017

See here for how to show field codes to Word and LibreOffice.
https://www.zotero.org/support/kb/word_field_codes

In terms of user experience, it would seem to me that an option to upload a docx or odt and have the program convert the whole thing — citations, text, and all — would be the best experience. Having to copy paste it at all feels like a workaround.

adamsmith · December 7, 2017

The alt+F9 thing only works in Word, not LibreOffice?

Correct, I'm not aware of a way to show Reference Marks in Libre Office (they show on hover over, but that's no help).

Even if we would do that, and we would obtain the CSL JSON through the upload and the contents through the paste, then how would we be able to find out which citation is where in the text?

The field codes start like this:

ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"PvkUfocC","properties":{"formattedCitation":"(Karcher and Steinberg 2013)","plainCitation":"(Karcher and Steinberg 2013)"},"citationItems":[{

followed by CSL JSON, so you could use the formatted citation to match.

I see the issue with uploading -- unfortunately can't really help with that. Presumably you can't rely on an existing parser like Pandoc? All I can really help with is to tell you what can be done from the Zotero side of things.

johanneswilm · December 7, 2017

I am playing around with the citation style editor (the "Download Style"-button seems not to work). There is no way to show what type is being referenced, is there?

johanneswilm · December 7, 2017

> In terms of user experience, it would seem to me that an option to upload a docx or odt and have the program convert the whole thing — citations, text, and all — would be the best experience. Having to copy paste it at all feels like a workaround.

Well, we need to provide a specific paste handler for those two programs anyway. So that's work that doesn't go away no matter what. Additionally offering an upload-function would not give us anything extra -- except Zotero citations. The user would still need to enter the document and clean it up, because we cannot really be sure 100% what the user meant with everything. Zotero-citations are quite important for a lot of users, of course, so if we could find a sponsor for creating and maintaining such a filter (costs estimated at 30,000 Euros initially and then 24,000 per year after that), we would offer that. Unfortunately we are not in that position and it providing this would currently likely absorb all our development efforts if we were to do it ourselves.

Those fields sound interesting. Too bad it's only in Word and not LibreOffice.

Hmm... maybe we just need to conclude that it's not really viable at the moment. Thanks so much for all the input though everyone!

bwiernik · December 7, 2017

@adamsmith Are you sure that’s correct? I thought Ctrl+F9 showed the field codes for LibreOffice?