Can I annotate my PDF snapshots?
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
I realise this is a complex issue, but as a feature, I would love Zotero to let me highlight and annotate PDFs such that they are separate to, but associated with the PDF and can be turned on and off as required when viewing PDFs opened within Zotero. I think of it as "non-destructive" editing of the PDFs that allows some sort of analysis to be done on the annotations and highlighted text themselves.
I realise this has been discussed elsewhere but as this thread appeared to have the most recent response, I decided to put my post here.
What is the latest on implementing such a feature, if plans exist?
Loving the work Zotero is doing!
What's wrong with using a pdf viewer with annotation features? There is FoxIt or PdfXchange for free, for a small price there is also pdfStudio, which works on linux, too - I'm sure there is more. The only real advance would be the ability to tie notes in Zotero to specific places in the pdf and that's as of now a really far away feature.
Thanks for the uber swift response. Nothing wrong with using a separate PDF viewer, I just think it would be a cool feature to have built into Zotero.
I know this is a "really nice to have" feature and it is not a game changer for me, but thanks for letting me know!
http://skim-app.sourceforge.net/
I'm only just trying it out now but on first glance it seems far better than the very limited capabilities of Preview.
Highlighting parts of PDFs, annotating it with the subject they refer to and later searching and filtering those annotations to find the right source for citation is what I'm looking for.
Mendeley has a couple of shortcomings, compared to zotero. But I think they are pointing the way with their pdf annotation features. That's exactly what I'm looking for, the only things missing is being able to search through the annotations and having this feature in zotero instead of mendeley ;-)
Thanks,
Dennis
Thanks for the insight on using Skim as an 'in' Zotero alternative. Would you mind elaborating on how to set it up to open within FF, as even after tweaking the preference/applications/preview Document (application/pdf) to action Skim, I still cannot prevent my Mac from opening it in a new window.
Thanks
Byron
In my view, PDF is a textbook example of an inferior, walled, and expensive format gaining prominence over a superior, open, and free format because of marketing targeted to often lazy adopters.
If Zotero featured DJVU annotation, perhaps more providers would be inclined to provide their documents in DJVU format.
I think the reason for not doing that in Zotero is also one of design philosophy - keep stuff that can easily be handled by outside applications to outside applications, rather than cramming everything into Zotero.
Using the annotation functions of an external viewer means your annotations are not accessible to Zotero search, and I think that along with the desire to annotate anything that appears in the browser is the real issue.
If we look at HTML annotation, which AFAIK is much easier to do, Zotero has one colour of highlight and sticky notes. Compare this to the free version of PDF X-change which has multicolour everything, underline, strikethrough, polygons, arrows, stamps etc. Bluebeam takes this a step further again with proofreading symbols, the ability to define custom symbols and a tablet pc mode with handwriting recognition. Even if the Zotero team spent the next year ignoring reference management and just producing a PDF viewer it wouldn't come anywhere close.
I'm not sure about the situation in OSX, but in Windows and Linux (under Wine) the PDF X-change viewer browser plugin allows this for PDFs.
To sum up: I share your desire for a fully integrated "paperless office" type reference management system, and agree that for the reasons you have mentioned Zotero (and every other alternative I know of) falls short of this ATM. However I don't feel that the best way forward is to spend time developing something that already exists. Rather we should integrate with what's out there (i.e. index PDF and DJVU annotations), push for better open source PDF support in other projects and integrate with that as well when it comes along. With a project as small (relatively speaking) as Zotero I don't really see a viable alternative.
There are two versions for the mac. One is working with any pdf reader which copies the highlighted text to the comment field of the highlight (as far as I know only Adobe Acrobat) and one is working with Skim.
The script is basically calling an external application which extracts the highlighted text from a pdf and save it to a text file, opens this text file and creates a separated note for each highlight and note in the pdf. The script picks the pdf which is attached to the currently selected Zotero item and adds the notes to this item. The first version is calling an application created with Automator on the mac and the second version is calling Skim on the command line. It works but is clunky mainly because the Automator app is not very fast.
Windows version: I haven't found a way to extract the highlighted text in windows yet (admittedly, I have not really looked into it). This code might help though:
http://forums.zotero.org/discussion/802/parsing-external-file-with-zotero-import-script/#Item_4
If replies could be detected by Zotero the smoothest thing would probably be to make them separate elements of a numbered list in the note.
It's not straight forward to make this work. So here is a description:
1) I have been using chickenfoot to write this so far. The advantage is that I don't have to install the a plugin very time you change something. Maybe there is a different way but it has worked for me. Of course, this has to be changed to a zotero plugin at some point. Anyway, download the chickenfoot plugin, open the sidebar and copy the code in a newly created script.
2) You need to create an Automator application which extracts the highlights from a pdf and saved them in a textfile. You just need two actions for this:
a) 'Extract PDF Annotations' select 'text' and 'highlight' as the Annotation Kinds, select all the 'Fields to Extract' and 'Pretend output with file path'.
b) 'New Text File' give the file a name and a location where to save it. I use 'pdf-extract.txt' and '~/Documents/temp'. You have to change the two globals 'textfile_path' and 'textfile_name' at the beginning of the script accordingly.
Save the application somewhere and change the global 'app_launch' at the beginning of the script. In my case it's:
"/Applications/Apps/extract-annotations.app/Contents/MacOS/Application\ Stub"
The part after .app should not be changed.
I am happy to send the application I have created to anyone.
3) Adobe Acrobat adds an id to the the highlights you create in a pdf. Change the global 'id' to this phrase.
Now you are pretty much ready to go. Just annotate a pdf which is attached (or linked) to one of your Zotero items, save the changes, select the zotero item, and press run in the chickenfoot sidebar.
I hope all this doesn't scare anybody away. Let me know if you have questions about making this work or the code.
Other Platforms: The only thing which is missing for running this on other platforms is the small application which puts the highlighted text in a text file. Once we have such an application, the function 'processTextfile' needs to be modified in order to recognize the separate notes in the text file.
http://gist.github.com/379397
Feel free to add to/edit the docs if you feel moved to do so.
it seems to be fairly developed, and uses evince, the Gnome document viewer.
the next step is if annotations could be posible, then you could launch pdfs in the browser from zotero, annotate & quote etc.
I have installed it and pdfs launch in firefox...
Your program seems great: Importing PDF highlights into Zotero notes are just what I need to do. It also has the advantage that Chickenfoot can be installed on FF3.6 while POW is not.
However, it seems that it doesn't work for Windows. (Am I right?) What I did on my Windows FF3.6 was just to install Chickenfoot, restart Firefox, copy & paste your code, and press 'Run' button on Cickenfoot. I then got the following error message:
SyntaxError: return not in function in eval() line 0
fileName chrome://chickenfoot/content/chickenscratch.js
lineNumber 11
message return not in function
name SyntaxError
stack eval("\n// some global variables... (truncated)
So, could you tell me a bit more specifically what part of the code might be incompatible with Windows? I'm not very familiar with Java language, but if it is only saving some data into a text file, I think it would be rather easy.. I also wonder why the same code doesn't work for Windows, because as far as I know, Java is a cross-platform language.. (is it about the folders, such as '~/Documents' vs 'C:\Users\' etc in Windows?)
greg
Glad that this discussion is still going on. As in 2007, I still think that it would be great if Zotero could index the contents and highlights of PDFs.
Couple of thoughts:
-there exist a number of programs to annotate PDFs (using the 'real' pdf annotation supported by the format).
-at least Acrobat Pro, but also the free PDF-XChange Viewer (Win-Only, but runs in WINE), support copying of highlighted text into hightlight comments.
-it is very easy to extract pdf-comments (and highlights) directly from the PDF. I wrote a small import function for Zotero myself (http://forums.zotero.org/discussion/802/parsing-external-file-with-zotero-import-script/) but because it was so ugly, it has never been continued.
@Greg
I really appreciate work on any solution, BUT
-if you create an external file, then it should be a fdf or xfdf file (both are standardized ways to store comments and the like of a pdf outside the file).
-directly parsing the PDF is really easy for anyone who has even a tiny programming experience. The comments and the page-tree are always stored uncompressed in a pdf: they're the only two things you need to parse.
Happily encouraging solutions,
Matthias
Could you tell me how I can try running your code? Yours seems to parse PDF & add notes to Zotero, if I understood correctly. If it is so, I would love to try it!