RTF/ODF Scan for Zotero

adamsmith · May 6, 2013

Frank Bennett (who did most of the work) and I (who did most of the cheerleading) are excited to announce Zotero RTF/ODF Scan, a plugin that allows you to use Zotero (properly) with any word processor capable of saving/exporting ODF format, including google doc and Scrivener.
Go here for download and instructions
http://zotero-odf-scan.github.io/zotero-odf-scan/

See here for more extended instruction with images:
http://zoteromusings.wordpress.com/2013/05/06/announcing-rtfodf-scan-for-zotero/

Please post all problems and questions here

fbennett · May 6, 2013

Sebastian is too modest to say so, but the plugin would not have happened without his input on documentation and coding.

We're keen to see users get the most out of this new bridge between Zotero and the various writing platforms in use out there. If you run into snags, please do let us know with a post to this thread.

shuhuang · May 7, 2013

Hello!

I'm new to Scrivener and have been trying to figure out how to use it with Zotero. This is definitely an improvement over the old system. Thanks for this!

Is there any way of hiding the Zotero markers when working on text in Scrivener? I find the markers are distraction, especially the ones with multiple references. They can be quite long.

Thanks!

adamsmith · May 7, 2013

no, unfortunately not - the reason this works reliably across different word processors is that it's just plain text.
It's possible to shorten the markers - e.g. you can manually remove the title of the item and it won't have any effect. We can also tell you how to customize the translator to only print author-date (currently you'd have to re-do this after each update of the plugin - if there's sufficient demand, we could consider letting users define that part via a preference).

kithairon · May 10, 2013

This is great – hats off to the volunteers and their offering of skills and time. A long harboured wish is fulfilled – and provided with a great documentation to boot! Gave the topic a new thread over at Scrivener and let the Nisus folks know. You guys forgot the donation button on your page! Thanks for this.

shuhuang · May 10, 2013

Hi adamsmith,

Would appreciate it if you could share with us how to have the translator print only author-date ... anything to get the markers shorter. They're quite a chunk, especially multiple references.

Thanks!

adamsmith · May 10, 2013

This post is no longer relevant. The marker no longer contains item titles by default
leaving this for historical reasons
In your Zotero data folder:
http://www.zotero.org/support/zotero_data
there is a folder called "Translators"
Open the folder and find the file "Scannable Cites.js" (Windows may not show you the .js)
Open the file with any text editor - notepad, TextEdit, etc. (don't use a word processor like WordPad, Word, or Scrivener)

In line 45 you'll see
mem.set(item.title,",","(no title)");

just comment that line out by putting to forward slashes at the beginning:
//mem.set(item.title,",","(no title)");

save the file. That's it. The change is effective immediately, no need to restart.
Note that you'll have to do this every time the plugin is updated (we overwrite the translator when we update).
As I say above we might allow this to be specified via user preference - I'd imagine that might happen in the next feature release (i.e. anything that contains more than bug fixes).

Rintze · May 10, 2013

@adamsmith, is there a particular need to include the "zotero://select/items/" substring in the citation marker? It's not particularly useful, is it?

adamsmith · May 10, 2013

I can discuss that with Frank.
It doesn't do anything, of course, but I think he's attached to the fact that it is a valid URI for a Zotero item. Currently you can't take it out because of the way the scan works, but that would be trivial to change.

Rintze · May 10, 2013

It's only of limited use as an URI, though, since it doesn't contain a Zotero library ID. It only resolves for the person who has a copy of the library from which the URI was generated, right? For others, the only value is that it shows that the citation marker is Zotero-specific.

shuhuang · May 10, 2013

thanks adamsmith, appreciate it very much! If you could drop the file reference at the end of the marker that'd be fantastic too. Again, anything to get the markers leaner. :)

dstillman · May 10, 2013

Yeah, I'm pretty sure we'd remove the zotero://select/items/ links if we integrated this. If you want a real URI, Zotero will generate one for you for an item, the way it does for the word processor plugins—it'd be a proper zotero.org URI for people who've synced and a local one otherwise. But I think not being distracting is more important for this use case than globally identifying an item and it should just use the bare minimum of an identifier for the functionality to work.

adamsmith · May 10, 2013

yes, URI isn't the right word, it's not universal - but the scan requires for everyone participating in writing the document to have a synced copy of the library so this does seem useful (though I'm not sure if the group library ids (the part before the underscore) are identical across synced computers. They may not be.

dstillman · May 10, 2013

They are.

adamsmith · May 10, 2013

Thanks for the input Dan - yes, I've discussed using the actual item.URI with Frank, too. I think as a minimum we could(should?) add removing zotero://select as a pref and set the leaner version as default. IIRC the original intention was to turn the whole thing into a live link, but that doesn't actually work very well.

dstillman · May 10, 2013

zotero:// links were never particularly meant to be reused anywhere, though perhaps that was bad planning. But among other things, the libraryID is just an internal identifier that we don't expose anywhere else. The server API, for example, just uses userIDs and groupIDs, which is why the real URIs have the /users/ and /groups/ parts. If this needs an identifier, I'd maybe do something like u:1234:ABCD2345 and g:4321:BCDE3456, which isn't exactly great but at least uses external ids and is less ugly, I would say, than the underscore (and certainly than the whole zotero://select/items/ part). For that matter, if you're content with "0_" as is (that is, using the implicit personal library), then you could just use the item key itself (e.g., "ABCD2345") for those and g:2345:ABCD2345 or similar for group items. (And at least in Zotero proper, we wouldn't pref this. Even if the links worked, I don't think there's enough value in being able to select items in Zotero to justify the links. Not distracting the user while they're writing seems much more important to me.)

I'd also be interested in seeing if empty separators could be removed, and if it could generally do more free-form parsing. I'm less up on the requirements here, but I would think that the parsing could be smarter about figuring out what was what intended.

(I'm not even sure the item identifiers should be there, though obviously there would be some significant downsides—prompting at parse time, like (I think?) RTF Scan does—to removing them. But at export time it could at least ensure that there was enough item data included to uniquely identify it given the current state of the database (if the export was moved out of the translator architecture).)

adamsmith · May 10, 2013

While I agree with making this leaner (I'm fine with reducing this to IDs as Dan suggests, let's see what Frank says), given the experiences with RTF scan, at least I strongly prioritize reliability over leanness - i.e. from my perspective something that makes the marker leaner but causes any problems in 1% of citations is definitely not worth it. I'm really attached to the fact that the scan just works, without any need for user interaction.

So from my perspective I definitely want a unique identifier in there (checking on export isn't enough - later additions can cause ambiguity). I'm at a minimum concerned about messing with the delimiters - the more complex the regex, the more likely it is to cause problems. The simplicity of the current parsing is, in my opinion, a core strength.

dstillman · May 10, 2013

from my perspective something that makes the marker leaner but causes any problems in 1% of citations is definitely not worth it

That's fair, but I don't think I agree. The way I see it, the options are to inconvenience some people occasionally/rarely when they're doing final processing of their documents, or to annoy/distract everyone all the time for the entirety of the time that they're writing.

dstillman · May 10, 2013

(To be clear, I'm talking generally about the leanness of the markers. I think separate arguments can be made about separators and identifiers.)

fbennett · May 10, 2013

I woke up this morning wondering if there would be reports of failed installs or errors in document conversion with the new plugin. Very glad to see there haven't been any. :-) It's also great to see this discussion.

We can certainly introduce a more concise identifier: I agree that the zotero://select url does add quite a bit of clutter. In some environments it can be useful, though, so support should continue and it should remain as an option. When using Zotero with an external note-taking tool (as opposed to a finished document), the user's priority might be to have quick access to the original Zotero item.^[1]

About item identifiers generally, as adamsmith says it is nice that conversion just works. It also allows us to cover legal resources, which don't lend themselves to freehand referencing. As one example, citing statutory law is tricky^[2] -- think of cites as pointers to pull requests and tagged versions in a git archive, which need to be referenced with precision. As a legacy from the print-only era, legal citation styles have rules of citation that capture the necessary detail, but the cite forms vary between jurisdictions. Then there are the quandaries associated with foreign language materials ...

I see ODF Scan as one more alternative. Users who prefer to enter cite markers freehand will use RDF Scan instead.

^[1] That was a requirement for the original converter and MLZ extension, written with Paul Troop, on which the plugin is based.

^[2] See this post by Thomas Bruce, director of the Cornell LII (scroll down to "Status, tracing, versioning and parallel activity").

fbennett · May 10, 2013

(As a side-note, it might be good to continue the technical side of this discussion over on zotero-dev.)

In the current design, dropping separators would introduce ambiguity: in a marker with three slots, the first two could be a prefix followed by a human-readable cite, a human-readable cite followed by a locator, or a human-readable cite followed by a suffix. I think we'll keep things as they are for the present.

With a more sophisticated parser, you could adapt Andrea Rossato's markup mechanism from pandoc + citeproc-hs, which is nicely designed, and used by at least one other project (Erik Hetzner's zot4rst). That syntax assumes a human-readable local identifier, though, which would require some sort of mapping table. It could get complicated.

A nice thing about the ODF Scan marker layout is that it's very simple to explain and to use. We'll see what users report back to this thread from the field, but so far it looks like the main issue is a desire for a more compact identifier.

Rintze · May 10, 2013

FWIW, I strongly side with Sebastian and Frank here. I would keep an identifier around (either internal or external).

aurimas · May 10, 2013

In the current design, dropping separators would introduce ambiguity: in a marker with three slots, the first two could be a prefix followed by a human-readable cite, a human-readable cite followed by a locator, or a human-readable cite followed by a suffix. I think we'll keep things as they are for the present.

I think you can mostly avoid the ambiguity.

with 1 section (this can be tricky and I think probably should be left unsupported), try to use content as item key. if it fails, use it as freeform cite. if that fails ask user to either pick item from lib or skip it.

for the remaining, check if last section is item key. if it is, don't count it towards number of sections.

with 1 (remaining) section, assume it's freeform cite.

with 2 - freeform plus prefix. if freeform cannot be matched, try freeform plus suffix

with 3 - assume prefix freeform suffix

I would also say locator can be in brackets at the end of freeform. this would avoid suffix/locator ambiguity both for the plugin and for the user

dstillman · May 10, 2013

aurimas: Yes, thanks. That's in the direction I was thinking. And it'd be nice if this could be made to at least look human-friendly (e.g., with commas (or nothing) instead of pipes).

Frank: I'm not arguing for freehand referencing specifically, just to consider whether having fixed fields is really necessary. Doing away with those would certainly require a more sophisticated parser, but that's what I'm arguing for.

I see ODF Scan as one more alternative. Users who prefer to enter cite markers freehand will use RDF Scan instead.

Why, though? Shouldn't the goal be to integrate this into a single interface in Zotero proper that can take a document—in different possible formats, and perhaps even with different kinds of possible markers, both freehand and Zotero-generated—and output a document with formatted citations? If there's enough embedded info that no disambiguation is required, great. If there is, an interface is available.

If this is never merged into Zotero then this doesn't particularly matter to me, but I assumed this was meant to be a stopgap measure until it could be worked into the existing interface.

Simon, who wrote the RTF Scan code, also may have other thoughts on this.

aurimas · May 10, 2013

And it'd be nice if this could be made to at least look human-friendly (e.g., with commas (or nothing) instead of pipes).

I think this will be quite challenging unless the freeform cite rules are made to be quite strict. Or, if we make the item key mandatory (I was suggesting above that it may be optional, which would allow, perhaps, for a more natural freeform writing process), then I think you may be able to get rid of the pipes.

I'm also with Dan on this:

Why, though? Shouldn't the goal be to integrate this into a single interface in Zotero proper that can take a document—in different possible formats, and perhaps even with different kinds of possible markers, both freehand and Zotero-generated—and output a document with formatted citations?

Though I'm a bit skeptical on us being able to support very unstructured freeform citations.

fbennett · May 10, 2013

@aurimas: There may be some misunderstanding about the role of the "freeform" portion of the Scannable Cite format (the second field). Its only role is to describe the source during authoring. Apart from the optional leading "-" character (to suppress the author), the second field is not parsed, and its content is discarded on the first Zotero citation refresh.

As things stand, the plugin either produces a valid Zotero citation or leaves the glaring marker in the document. The risk of getting a well-formed citation with (harder to spot) unintended locators or affixes is pretty low. The appearance of the pipe markers in the draft may be less than tidy, but trust in the output reduces distraction of another kind.

A lot of work has been done in legal circles lately on parsing human-readable citations out of free text. Citation marker schemes provide a little more structure to work with, but I guess some of the stories leave me feeling there are benefits to maintaining some distinction between the human-readable and the machine-readable.

Adaptive parsing to cover both RTF and ODF Scan markers (and maybe others) is certainly something to keep in view. The plugin code provides a nice starting point for exploring more flexible approaches. I'm sure there will be further experiments, but we're pretty happy with this for the present.

We're looking into reducing the bulk of the markers, and will post to the thread again when there's something to show.

adamsmith · May 11, 2013

Frank and I agree on this - I don't see us dropping either the ID or the pipes for something more "human readable" - this may be a disciplinary thing - both Frank and I read and write a lot of stuff where affixes matter and contain most "human readable" delimiters like commas and semicolons.
I find the thought of trying to parse what's a delimiter in a marker and what isn't really unattractive. And what Frank says about the problem of citations that don't come out right. In a 300 page document I need this to work 100% reliable. Not 98% reliable. We're going to keep it that way for our add-on.

I'm happy to chime in if/when you're looking at incorporating this into Zotero. My general position is that I think you underestimate the degree to which you will put off folks in the humanities - most notably in history! - if you make things fragile or unreliable for citations involving complex affixes.

I don't really see the downside of the pipes if we're going to keep the ID, since at that point we require a Zotero generated marker anyway and the pipes are printed by that.

DWL-SDCA · May 11, 2013

My shop does a lot of work involving CSV files. Sometimes we edit them directly. Before editing we always convert the commas to pipes to make the files more readable. We convert the pipes back before using the file. The extra effort is worth it.

I believe that the use of pipes helps to make the new plug-in's insertions _more_ human friendly. With pipes it becomes quite clear what is what. With commas or semicolons everything seems to run together.

I urge everyone to do a use-test. Write a paragraph using the plug-in as it currently stands. Copy that paragraph and edit it to replace the pipes with your choice of other delimiter. Compare the two. I find the pipe version more friendly. Another test would be to ask a naive person to look at the two paragraphs and ask which more clearly defines the parts of the inserted citation markers.

fbennett · May 11, 2013

Since both conversion modes run in the same wizard, it should be possible to adapt the ODF Scan code to call the RDF Scan resolution screen and return citation data, for markers that contain no pipes. If the two marker methods are treated separately, results should be predictable enough.

It's a tempting thought, but I'll step slowly with it: the RTF Scan syntax for suppress-author cites, in particular, may be tricky to handle against ODF markup. Would be handy if it could be made to work, though.

adamsmith · May 12, 2013

We've just updated to version 1.0.9 which defaults to author, (year) and a shortened citation marker. This looks much leaner.
The change is backwards compatible (i.e. Zotero continues to recognize the old markers) and there are hidden prefs to use the old behavior, documentation will go up tomorrow.

epederick · May 12, 2013

Hi all - just caught up with this and have given it a quick run ... wonderful!! Thanks to Frank, Adam, Dan et al. Re comments above - yes, it produces a chunk of 'stuff' when writing in Scrivener but what the heck if you know you are going to end up with reliable live citations in the compiled document? Love that I can now include prefix/suffix material as is possible in the regular plugins.
Now I - and no doubt many others - will be scrambling to re-enter citations in the Scannable Cite format. Time well spent, and a great relief because now we have confidence that this cross-platform combination (a) works and (b) is supported. My only request is that if this system is updated (ie made prettier) that the ODF scan will continue to be able to read citations entered in the current Scannable Citation format.
Back to work, smiling broadly ... cheers Evan