Zotero PDF reader / annotator

I'd like to share a vision. This is not really a feature request.

In my ideal world, the new stand-alone version of Zotero would include a PDF / Document reader that would allow you to open a document, read it, highlight it, and make annotations. It would be possible to tag your annotations and/or tag selected portions of text from within the document reader (ideally using a hierarchical tagging structure).

Finally, in the Zotero UI, it would be possible to do advanced searches of your library and of your tagged text and annotations, enabling you, for instance, to look for all instances of the term "shipwreck" in portions of text tagged with "whales:aggressors" and in all documents tagged with "seafaring narratives"... If your advanced search came up with 40 examples that met your criteria, there would be a preview pane where you could scroll through a list of previews of the selections of text in a list view. You could then click on that text and a new tab would open in which the selected document was open to the appropriate selection, allowing you to make further notes, or to link the text to some sort of external memo....

The idea here is a mashup of Zotero with qualitative data analysis software (like QSR NVIVO or ATLAS.TI) and with a PDF annotation program like Qiqqa.

Uhm, I know that the zotero development team doesn't have this kind of thing on the calendar. So far, Qiqqa itself doesn't seem up to the task, GUI-wise and reference management wise. I'd really like some solution that would enable sharing of tags and databases back and forth with Zotero -- so far all solutions out there that claim to "work with your zotero database" are actually one-way imports from Zotero. But Zotero really rules the roost in terms of being cross-platform, being open source, being able to search for and suck in new references accurately, and being able to capably handle inserting references into documents in the word-processor in an intelligent, modifiable way. No other program can do that stuff as well.

Alright, I've said it. Every now and then I feel the need to write this up in the hope that it will influence more capable people out there to work on it...

Thanks for reading.
  • We have some plans for PDF annotation, but they will probably come relatively slowly. We essentially need a PDF browser plug-in with annotation support. We have some plans to add this to firefox-mac-pdf on OS X, but AFAIK there aren't any good open source PDF plug-ins for Windows or Linux, so we'd need to write these. Furthermore, the main open-source PDF library, Poppler, has limited support for PDF annotation. While getting this to work on OS X will require some minor tweaks to existing software, cross-platform support will be a much larger undertaking.
  • Thanks, that's interesting. Good to know.
  • Simon: keep an eye out for this one:
    http://www.libertexto.org/
    I've seen an early alpha and it looked promising
  • adamsmith, thanks. I hadn't seen that. It looks like there's some code for the PDF plug-in component already available.
  • That sounds very interesting! Especially considering that I am a mac user... :)
    Please also consider some way to add a summary of the annotations to the Zotero item as a note, if that is possible...
  • Having notes and annotation work like they do in sente would be brilliant.

    E.g., every time I annotate a PDF or a HTML document, the annotated text or the note will become a searchable and viewable note in Zotero. That would really put Zotero on top of all the competitors.
  • I've been experimenting some, and I think that we could manage to ship a version of gocr with Zotero, much like the pdfinfo libraries, to provide at least basic OCR to get a full-text index for images and PDFs. Obviously not for Zotero 2.1, but maybe in the next several iterations we could make something like this happen.
  • Hello. We have jus launched Libertexto
    http://www.libertexto.org
    As you can see, it is a preliminar version, but I think it is a good starting. It is heavily inspired by Zotero and, of course, it would be a great idea to merge their functionalities.
    Suggestions are welcome.
  • @Ribanez: You should translate it to english
  • I know. I hope it will be soon.
  • If you can provide strings, I'm sure you can recruit Spanish-to-English translators here on the Zoter forums, especially if it will help bring native PDF support to Zotero.
  • We can definitely crowd-source the translation (perhaps starting from google translate?). I can help.

    Here's a start:

    Libertexto

    Aplicación para mejorar la lectura digital
    Libertexto es una extensión del navegador Mozilla Firefox que permite realizar sobre los textos electrónicos acciones análogas a las que realizamos sobre el texto impreso, añadiendo una serie de funcionalidades que aprovechan algunas ventajas del formato digital.

    Libertexto: an app for improved digital reading
    Libertexto is a Mozilla Firefox extension that allows users to apply to electronic text the same actions they apply to printed text, while at the same time allowing them to take advantage of the digital format.

    Anyway, we should find the appropriate venue to translate it -- some sort of collaborative online document site.
  • Also, Ribanez - where would you like comments for libertexto to be posted or sent? (specifically, it needs a "save" button or context menu - right now it's useless for pages with embedded pdfs such as Cambridge U Press because there's no way to save the pdf on that page.)
  • You are very kind in offering to translate Libertexto. I have told the programmers about that and I hope we could give an answer as soon as possible.

    @adamsmith: you can use the "contacto y sugerencias" form in libertexto.org to send comments or use my email: ribanez@cordoba.uned.es.

    We have worked with the Evince team and Libertexto has been crucial to the development of the Evince annotations features, which have been implemented in the main stream of the viewer. Of course, we have to make some improvements in the integration of Evince and Firefox (for instance, a "save button")

    Thanks for the comments and suggestions (and apologies for my English). We are really excited to see the reception of Libertexto.
  • It's worth noting that Okular on linux supports PDF annotation. Annotations are stored out side of the file - in Okular's preferences directory. I don't know if this is problematic, but it does have the benefit that pdfs don't have to be modified, which I assume means that zotero.org could save space by hardlinking identical files...
  • but because the annotations are saved outside of the file, they're also not synced to the Zotero server. Also, they can't be written to the file at all, so they can't be changed and, at least last time I used Okular, they got lost when you moved the file. Unfortunately we still don't have a viable FOSS solution for annotating pdfs :-(
  • xournal does save pdf annotations to the file, but I'm haven't used it a whole lot..
  • edited February 14, 2012
    No one mentioned Windows yet, but I can see already it's gonna be tricky to do this in a cross-platform way.

    I use SumatraPDF, it's the most minimalistic viewer out there, and it's open source.

    Anyway, the problem is that annotations will be embedded in a PDF in all kinds of viewer-specific formats. How about keeping them on an external file, like [pdfname].ann, and using a format that any PDF reader can read/write and extend easily, like XML? It would then be a matter of having the readers store annotations in this format and Zotero could use them trivially.

    It would also prevent Zotero constantly re-uploading files of many megabytes just because you changed a tiny annotation.
  • There is a clear advantage for using the PDF standard for any annotations: The files with annotations can be opened and read by all applications that support the standard, which includes apps on different platforms and tablet apps. Skim, for example, uses an own definition of annotations so the files can not be read other applications.
    I think the pdf standard supports the saving of annotations in external files but a lot of apps do not support this feature as far as I know.

    By the way, my zotfile plugin already supports the extraction of annotations from pdf files and saves them in Zotero notes. This way everyone can use their preferred pdf app for Windows, Mac, Linux, iOS, Android.... I never understood why zotero has to be able to create pdf annotations itself. Without having any knowledge about zotero's plans, I assume that pdf.js might allow Zotero to do that someday though.
  • edited February 15, 2012
    Good points Joscha.

    I've seen ZotFile and the way it handles annotations seems like an excellent compromise! Annotations are extracted and added as a note. Zotero already supports everything but the extraction bit. This way annotations are searcheable.

    I think this could be done in a process similar to how Zotero indexes texts: automatically when a file is added, and updated when it is modified. Zotero already detects when a file changes, after all, to re-upload it.

    Also, to make sure these notes don't pile up, if there is an existing note starting with "Annotations extracted at X:" it could be overwritten. This could work well and be much easier to implement than what we discussed earlier.
Sign In or Register to comment.