Can I annotate my PDF snapshots?

morisimo · February 12, 2010

Mendeley supports highlighting and annotation notes in PDFs from within the Mendeley application.

I realise this is a complex issue, but as a feature, I would love Zotero to let me highlight and annotate PDFs such that they are separate to, but associated with the PDF and can be turned on and off as required when viewing PDFs opened within Zotero. I think of it as "non-destructive" editing of the PDFs that allows some sort of analysis to be done on the annotations and highlighted text themselves.

I realise this has been discussed elsewhere but as this thread appeared to have the most recent response, I decided to put my post here.

What is the latest on implementing such a feature, if plans exist?

Loving the work Zotero is doing!

adamsmith · February 12, 2010

there are no plans that I'm aware of and since there is no current open source pdf annotation tool I don't think anything will be coming soon (and by "soon" I mean within the next year at least). Mendeley's pdf annotation is nice, but if I remember correctly doesn't travel to other pdf software (e.g. acrobat reader). (though I guess that seems to be pretty much what you want)
What's wrong with using a pdf viewer with annotation features? There is FoxIt or PdfXchange for free, for a small price there is also pdfStudio, which works on linux, too - I'm sure there is more. The only real advance would be the ability to tie notes in Zotero to specific places in the pdf and that's as of now a really far away feature.

morisimo · February 12, 2010

Adam,

Thanks for the uber swift response. Nothing wrong with using a separate PDF viewer, I just think it would be a cool feature to have built into Zotero.

I know this is a "really nice to have" feature and it is not a game changer for me, but thanks for letting me know!

ElizaJaneDarling · February 20, 2010

For Mac users - there's an open source PDF annotation software called Skim:

http://skim-app.sourceforge.net/

I'm only just trying it out now but on first glance it seems far better than the very limited capabilities of Preview.

kithairon · February 21, 2010

Again, Mac users only: Skim is excellent. Have set up FF to be using Skim as standard viewer and read & annotate pdfs with it starting them from within Zotero. The actual annotations go automatically into the same directory as the pdf and stay connected but different from the pdf-file (they receive the same name as the pdf and a .skim extension). If needed they can also be saved to the pdf itself, i.e. become legible for Acrobat Reader and other tools and travel with the pdf file to wherever wished. Recommendeable tool.

dennisdd · April 18, 2010

I'd like to support JonEPs first post. It's pretty much going the way I see we need to go.
Highlighting parts of PDFs, annotating it with the subject they refer to and later searching and filtering those annotations to find the right source for citation is what I'm looking for.

Mendeley has a couple of shortcomings, compared to zotero. But I think they are pointing the way with their pdf annotation features. That's exactly what I'm looking for, the only things missing is being able to search through the annotations and having this feature in zotero instead of mendeley ;-)

Thanks,
Dennis

byronqually · April 19, 2010

Hi Kithairon,

Thanks for the insight on using Skim as an 'in' Zotero alternative. Would you mind elaborating on how to set it up to open within FF, as even after tweaking the preference/applications/preview Document (application/pdf) to action Skim, I still cannot prevent my Mac from opening it in a new window.

Thanks
Byron

EWENSS · April 23, 2010

If annotation cannot be done with PDF snapshots, perhaps they can be done with DJVU snapshots.

In my view, PDF is a textbook example of an inferior, walled, and expensive format gaining prominence over a superior, open, and free format because of marketing targeted to often lazy adopters.

If Zotero featured DJVU annotation, perhaps more providers would be inclined to provide their documents in DJVU format.

ajlyon · April 24, 2010

I understand that the Zotero core developers are not about to work on their own reading and annotation plugins for PDF, DejaVu or other formats. I imagine they would be interested in the development of flexible, cross-platform systems for embedding annotation into PDFs and DejaVu files, and would be glad to distribute any replacements for pdfinfo and pdftotext that provide improved handling of annotations and file content. That said, development in this direction does not seem to be forthcoming from neither the DejaVu nor the PDF communities.

EWENSS · April 24, 2010

The thing is that one of Zotero's main competitors, Mendeley, offers PDF annotation. To be unresponsive for the same function within Zotero is a recipe for user migration. Simple as that. If it's the proprietary format of PDF that is problematic, fine, PDFs can easily be converted to DJVU on the fly with something like http://any2djvu.djvuzone.org/

adamsmith · April 24, 2010

I don't agree about Mendeley - I think having a 2nd class pdf viewer (compared to stand alone alternatives) and annotations which non-Mendeley users can't see isn't all that great and unlikely to be a reason for many people to switch. I think people are much better of working with pdf Xchange, Skim and the like.

I think the reason for not doing that in Zotero is also one of design philosophy - keep stuff that can easily be handled by outside applications to outside applications, rather than cramming everything into Zotero.

EWENSS · April 25, 2010

In Mendeley, you simply save the annotated PDF to disc and text annotations read as a popup note in Adobe Reader; highlights appear as such. Does this mean Mendeley is paying Adobe? I don't know, but I'd be willing to pay for a "Pro" version of Zotero that did the same thing.

Using the annotation functions of an external viewer means your annotations are not accessible to Zotero search, and I think that along with the desire to annotate anything that appears in the browser is the real issue.

Bionatsci · April 25, 2010

Using the annotation functions of an external viewer means your annotations are not accessible to Zotero search

I agree this is an important problem but IMO it would be better handled by investigating ways to index the annotations produced by external viewers. I say this because it would be quite impossible given Zotero's limited resources (2 full time developers as far as I am aware) to produce a PDF viewer with the features of PDF X-Change, bluebeam etc..

If we look at HTML annotation, which AFAIK is much easier to do, Zotero has one colour of highlight and sticky notes. Compare this to the free version of PDF X-change which has multicolour everything, underline, strikethrough, polygons, arrows, stamps etc. Bluebeam takes this a step further again with proofreading symbols, the ability to define custom symbols and a tablet pc mode with handwriting recognition. Even if the Zotero team spent the next year ignoring reference management and just producing a PDF viewer it wouldn't come anywhere close.

and I think that along with the desire to annotate anything that appears in the browser is the real issue.

I'm not sure about the situation in OSX, but in Windows and Linux (under Wine) the PDF X-change viewer browser plugin allows this for PDFs.

To sum up: I share your desire for a fully integrated "paperless office" type reference management system, and agree that for the reasons you have mentioned Zotero (and every other alternative I know of) falls short of this ATM. However I don't feel that the best way forward is to spend time developing something that already exists. Rather we should integrate with what's out there (i.e. index PDF and DJVU annotations), push for better open source PDF support in other projects and integrate with that as well when it comes along. With a project as small (relatively speaking) as Zotero I don't really see a viable alternative.

Greg · April 25, 2010

I agree with Bionatsci. It is actually relative easy to extract the highlighted text from pdfs as long as the highlighted text is copied to the comment field of the highlight. Unfortunately, Adobe Acrobat is the only pdf reader I am aware of which provides this option. I wrote some code for the mac which adds the highlights as notes to the item. It is very rough and far from optimal but it works. I am happy to share it with anyone who is interested...

ajlyon · April 25, 2010

@Greg: Such code would be quite nice to see. With some work, perhaps it could be incorporated into Zotero proper.

Greg · April 25, 2010

Hi ajlyon, are you interested in using the code or also in working on it? I am just asking because I would appreciate pretty much any form of cooperation. My time is very limited and I haven't been working on it for a long time. But I bet someone else could bolster my motivation. Below is a description of the current script. I will post the code shortly.

There are two versions for the mac. One is working with any pdf reader which copies the highlighted text to the comment field of the highlight (as far as I know only Adobe Acrobat) and one is working with Skim.
The script is basically calling an external application which extracts the highlighted text from a pdf and save it to a text file, opens this text file and creates a separated note for each highlight and note in the pdf. The script picks the pdf which is attached to the currently selected Zotero item and adds the notes to this item. The first version is calling an application created with Automator on the mac and the second version is calling Skim on the command line. It works but is clunky mainly because the Automator app is not very fast.
Windows version: I haven't found a way to extract the highlighted text in windows yet (admittedly, I have not really looked into it). This code might help though:
http://forums.zotero.org/discussion/802/parsing-external-file-with-zotero-import-script/#Item_4

sean · April 25, 2010

It also may be relatively trivial to read PDF annotations and display them as (probably exclusively read-only) Zotero notes. They're normally stored in the clear in PDF files, no?

ajlyon · April 25, 2010

I'm interested mainly in working on the code-- I don't currently use a Mac for much of my work, but PDF annotation would be great to have, and I'd be quite interested in working on a cross-platform solution that would work under Linux. So count me in for at least trying my hand.

Bionatsci · April 26, 2010

One is working with any pdf reader which copies the highlighted text to the comment field of the highlight (as far as I know only Adobe Acrobat)

PDF X-Change viewer can also do this if you enable it in the preferences. PDF X-Change isn't available for OSX, but it's a windows and possibly a linux solution if you are willing to run it under wine.

It also may be relatively trivial to read PDF annotations and display them as (probably exclusively read-only) Zotero notes. They're normally stored in the clear in PDF files, no?

I don't know how they are stored within PDFs but if it could be made to work that would be perfect for me. One thing which would have to be considered is how to handle "replies", basically a "comment on a comment" made by the same or a different user. A typical use case for this would be setting the viewer to copy the highlighted text into the comment field as Greg suggests, and then making your own observations in a reply.

If replies could be detected by Zotero the smoothest thing would probably be to make them separate elements of a numbered list in the note.

Greg · April 26, 2010

okay, here is the code. I know, it's pretty ugly but hey I am usually just doing statistical programming...

It's not straight forward to make this work. So here is a description:

1) I have been using chickenfoot to write this so far. The advantage is that I don't have to install the a plugin very time you change something. Maybe there is a different way but it has worked for me. Of course, this has to be changed to a zotero plugin at some point. Anyway, download the chickenfoot plugin, open the sidebar and copy the code in a newly created script.

2) You need to create an Automator application which extracts the highlights from a pdf and saved them in a textfile. You just need two actions for this:
a) 'Extract PDF Annotations' select 'text' and 'highlight' as the Annotation Kinds, select all the 'Fields to Extract' and 'Pretend output with file path'.
b) 'New Text File' give the file a name and a location where to save it. I use 'pdf-extract.txt' and '~/Documents/temp'. You have to change the two globals 'textfile_path' and 'textfile_name' at the beginning of the script accordingly.
Save the application somewhere and change the global 'app_launch' at the beginning of the script. In my case it's:
"/Applications/Apps/extract-annotations.app/Contents/MacOS/Application\ Stub"
The part after .app should not be changed.
I am happy to send the application I have created to anyone.

3) Adobe Acrobat adds an id to the the highlights you create in a pdf. Change the global 'id' to this phrase.

Now you are pretty much ready to go. Just annotate a pdf which is attached (or linked) to one of your Zotero items, save the changes, select the zotero item, and press run in the chickenfoot sidebar.

I hope all this doesn't scare anybody away. Let me know if you have questions about making this work or the code.

Other Platforms: The only thing which is missing for running this on other platforms is the small application which puts the highlighted text in a text file. Once we have such an application, the function 'processTextfile' needs to be modified in order to recognize the separate notes in the text file.

Greg · April 26, 2010

here is the code:
http://gist.github.com/379397

ajlyon · April 26, 2010

Thanks for posting this. I'll play with my setup and see if I can get something working using this under Linux.

kieren · April 26, 2010

Greg,

I know, it's pretty ugly but hey I am usually just doing statistical programming...

That code is fine - well commented and clearly written. I didn't know about chickenfoot - I will look at it now. For an alternative plugin-less interaction with Zotero look at my web browser / server side javascript approach (unfortunately not working in Firefox 3.6 at the moment, 3.5 only): http://github.com/singingfish/zotero-browser

kieren · April 27, 2010

Greg (again). After a bit of a hack with your code, and trawling the forums to find out exactly where ZoteroPane came from, the code snuippet has made it modified into the user docs page that I've been curating: http://www.zotero.org/support/dev/api_user_docs#get_the_zotero_pane_to_interact_with_the_zotero_gui

Feel free to add to/edit the docs if you feel moved to do so.

balingup · July 30, 2010

how about libertexto? from http://blogs.igalia.com/eocanha/?p=182

it seems to be fairly developed, and uses evince, the Gnome document viewer.
the next step is if annotations could be posible, then you could launch pdfs in the browser from zotero, annotate & quote etc.
I have installed it and pdfs launch in firefox...

ohthere · August 22, 2010

Greg,

Your program seems great: Importing PDF highlights into Zotero notes are just what I need to do. It also has the advantage that Chickenfoot can be installed on FF3.6 while POW is not.

However, it seems that it doesn't work for Windows. (Am I right?) What I did on my Windows FF3.6 was just to install Chickenfoot, restart Firefox, copy & paste your code, and press 'Run' button on Cickenfoot. I then got the following error message:

SyntaxError: return not in function in eval() line 0
fileName chrome://chickenfoot/content/chickenscratch.js
lineNumber 11
message return not in function
name SyntaxError
stack eval("\n// some global variables... (truncated)

So, could you tell me a bit more specifically what part of the code might be incompatible with Windows? I'm not very familiar with Java language, but if it is only saving some data into a text file, I think it would be rather easy.. I also wonder why the same code doesn't work for Windows, because as far as I know, Java is a cross-platform language.. (is it about the folders, such as '~/Documents' vs 'C:\Users\' etc in Windows?)

Greg · August 23, 2010

Hey! yes, the code does not work for Windows. The reason simply is that I rely on a small application which does the actual extraction of the text from the highlights and this application only works on Mac (it's an Automator application). So if you know an application for Windows which can replace the Mac version, it should be relatively easy to change the code so that it works with the windows version. This application has to take an argument (the location of the file), extract the text from the highlights, and save the text in a txt file under a predefined location.

greg

mheim · September 6, 2010

Hi everybody,

Glad that this discussion is still going on. As in 2007, I still think that it would be great if Zotero could index the contents and highlights of PDFs.

Couple of thoughts:
-there exist a number of programs to annotate PDFs (using the 'real' pdf annotation supported by the format).
-at least Acrobat Pro, but also the free PDF-XChange Viewer (Win-Only, but runs in WINE), support copying of highlighted text into hightlight comments.

-it is very easy to extract pdf-comments (and highlights) directly from the PDF. I wrote a small import function for Zotero myself (http://forums.zotero.org/discussion/802/parsing-external-file-with-zotero-import-script/) but because it was so ugly, it has never been continued.

@Greg
I really appreciate work on any solution, BUT
-if you create an external file, then it should be a fdf or xfdf file (both are standardized ways to store comments and the like of a pdf outside the file).
-directly parsing the PDF is really easy for anyone who has even a tiny programming experience. The comments and the page-tree are always stored uncompressed in a pdf: they're the only two things you need to parse.

Happily encouraging solutions,

Matthias

JonEP · September 6, 2010

I think this has been mentioned elsewhere in the forum, but qiqqa (http://www.qiqqa.com/) has sort of started on a PDF anaylsis and markup system, with moves towards integrating with Mendeley. Both are Windows-only. Qiqqa makes a copy of your pdf library to its own folder, which seems problematic to me. But, anyway, there are a few efforts going on out there. I still would love to see all of that functionality in one place, with a shared, hierarchical tag structure and a nice graphic interface for searching through and making connections among multiple documents.

ohthere · October 6, 2010

@mheim
Could you tell me how I can try running your code? Yours seems to parse PDF & add notes to Zotero, if I understood correctly. If it is so, I would love to try it!