ZotFile - Advanced PDF management for Zotero

Joscha · February 6, 2014

Philipp Rommel, I will make a separate post about this in a second.

krcrouse and Olib170, you are right about that and it's a bug but I think unrelated to Olib170 request and the proposed behavior doesn't really make sense. What if I have two attachments for the same item and want to rename both of them? That is exactly why zotfile adds a 2 to one of the attachments. That should not happen if the user renames the same attachment in zotero though. In that case it is a bug but otherwise adding numbers if the file already exists is desired behavior. I am not sure why the bug is so dramatic considering that there is no reason to constantly re-rename your files. I have created a ticket for it though. Olib170, if you still want to implement your scenario, which I think is different, open an issue on github. I am happy to point at the relevant parts of the code but I don't really see why it should be part of Zotero.

Joscha · February 6, 2014

Go to annotation in pdf

With two recent changes in Zotero (see this and https://github.com/zotero/zotero/pull/450 pull request), the zotfile feature Go to annotation in pdf is going to be much more useful!
Currently, the changes are part of the Zotero beta channel and hopefully will make it for the next version.

So far, it never worked on Windows but I just pushed some changes to zotfile, which hopefully make the links work in windows as well. Can someone try and report back? Just install the current zotfile version from github (and the Zotero beta). If you run into an access denied error, look at this page for some help.

There is also am addition for Mac. You can now use Skim instead of Preview to open pdfs and go to a specific page, which works much better. Just set the hidden preference 'zotfile.pdfExtraction.openPdfMac_skim' to true.

adamsmith, it's working fine on Linux, right? Any possible improvement? I am updating the feature now that it going to works better with Zotero.

adamsmith · February 6, 2014

Joscha - looking good for me so far in ubuntu with evince, thanks for making that happen.

bwiernik · February 6, 2014

Joscha,

I can confirm that the links do work on Windows in the Zotero beta. However, initially, I received an error that the path to Acrobat was not specified. I found the preference in about:config and set it to the Acrobat.exe path. Now, when the link opens, it opens Acrobat. Acrobat returns an error, "There was an error opening this document. The filename, directory name, or volume label syntax is incorrect. Is there a different way I should have set up the preference to get the link to resolve?

Joscha · February 7, 2014

That's too bad. I was hoping that it works now but my Windows access is very limited. You can set the setting to something like 'C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe' but the idea is that zotfile finds the path itself.

If someone has coding experience and is able to help out, take a look at this github issue. I posted the relevant code there and you can play around with it in the Firefox's javascript console Scratchpad. Otherwise this has to wait until I have the opportunity to work on a windows computer.

Olib170 · February 7, 2014

Dear Joscha,
thanks for your reply, you wrote :
-------------------
Olib170, if you still want to implement your scenario, which I think is different, open an issue on github. I am happy to point at the relevant parts of the code but I don't really see why it should be part of Zotero.
------------------

Zotero has the inconvenience that it stores attached files in its own directory structure, even if one is already in place. It seems to me that this is part of the reason why Zotfile has been programmed.
However, in the moment Zotfile will only help to externalise the attachments directory structure, but does not help to keep one that is already in place.
I have , for exampel, some 4000 pdfs in an existing directory structure /%y/firstauthor.year.pdf (and if there are several ones int he same year I add author.year.b.pdf, author, year.c.pdf etc).
For several reasons (and I can think of a whole list that other people might have with existing pdf collections), I want to keep my existing pdf collection as it is. The bibtex export of my existing database contains all the links to the pdfs, but when importing, Zotero copies all of the pdfs. And when I point zotfile to my existing pdf directory with /%y, then I get copies of all my pdfs as author2.pdf .....
However if Zotfile could check if the file is already existing (or even have a look in the URL field in Zotero to figure out the existing file name) one could use Zotfile in one go to change all the links in zotero to the correct files.
This would take away one of the major shortcomings of Zotero, which has been discussed over times several times.
I would really like to change to Zotero, as i see many advantages, but this pdf collection story stops me from doing so.
Cheers
Oliver

Olib170 · February 7, 2014

I'm using a naming convention that does not seem to be (completely) possible in Zotfile and I wondered if this could perhaps be introduced ?

I use generally simply "author.year.pdf", son "%a.%y.pdf" and for the cases where I have several publication by the same author in the same year I use the usual abs convention used in many publications, excluding the "a", so the second publication by this author in this year would be "author.year.b.pdf" and the third "author.year.c.pdf" etc. Would it be possible to define this in Zotfile ?

Cheers

Oliver

bwiernik · February 7, 2014

Joscha,

In that case, it really doesn't work on Windows at all, still. Even when the path is specified, Acrobat doesn't know how to handle the location tags and returns an error rather than opening the file.

Joscha · February 8, 2014

Now it should finally work! Thanks to aurimasv for helping out! bwiernik, set the 'openPdfWin' setting back to "" and try out the current development version on github. Please report back with your windows version and the pdf viewer.

bwiernik · February 9, 2014

Works perfectly with Windows 8.1 and Acrobat Pro X! Great work.

sudavolstviem · February 9, 2014

Hi, is there any place that explains how to set up the basic workflow with zotfile + zotero standalone? I feel quite stupid, but I don't understand what I'm supposed to do.

I've installed zotero standalone and the zotfile plugin in both zotero and firefox. (Do I need the latter? It doesn't seem to do anything.) I've set zotfile to watch my "Documents" folder. Now say I download a journal article named 123456.pdf into the Documents folder. I'd like to (1) add the article to zotero, (2) retrieve the metadata, and (3) rename the file accordingly. To achieve this, at the moment I (1) drag the file into zotero, (2) right-click and select "Retrieve Metadata...", (3) right-click again -> "Manage Attachments" -> "Rename". This doesn't actually rename the file but create another copy with the correct name, but OK. -- Anyway, I'm sure this rather tedious process isn't how it's meant to be. Isn't zotfile somehow supposed to notice automatically when a new article appears in the folder? Can't it automatically look up the metadata and rename the file?

adamsmith · February 9, 2014

ZotFile doesn't perform functionality that you'd typically associated with a "watched folder". It only attaches files from the source folder to items that already exist.

I can't speak for Joscha, but in terms of Zotero, adding PDFs and using retrieve metadata is in most cases not a recommended workflow. The data you get - most typically from google scholar - is significantly worse that data from databases or journal publishers. If you really do want that workflow, dragging the PDF to Zotero (or using "store copy of file") and manually running "retrieve metadata" is the only option. ZotFile doesn't really play into this, except that it can move the file to a desired location and has more finely grained renaming options than Zotero proper.

sudavolstviem · February 9, 2014

@adamsmith: Thanks. I'd be happy to use metadata from journal publishers instead, but how would I do that? What is the intended workflow?

adamsmith · February 9, 2014

Use the icon in (or in Safari next to) the URL bar:
http://www.zotero.org/support/getting_stuff_into_your_library#web_translators

sudavolstviem · February 9, 2014

That's not available with zotero standalone, is it? (The add-on slowed down my browser, but if there's no other way I'll go back to it.)

adamsmith · February 9, 2014

With Standalone you need the extension for the browser you're using - ideally Chrome, Safari, or Firefox:
http://www.zotero.org/download/

those will not slow done the browser noticeably.
(edit: and for anything beyond this pls. start a new thread so we're not derailing this one).

Joscha · February 12, 2014

Beta test: The next zotfile version includes some improvements for the extraction of annotations. I updated the extraction code to the most recent pdf.js version and made some performance improvements. It should now work with more pdfs and run about 40-60% faster. Some testing for these changes would be great. The beta is available here. If you install this version, please play around with the extraction, re-extract some annotations and compare them with the result from the last version. Thanks!

Larisa · February 12, 2014

Hi thanks so much for this, I'm attempting start using zotfile and zotpad. I've been using Zotero for a while. I just tried my first ever extracting highlighted text, except that when I clicked on "manage attachments" I didn't see an option for "extract highlighted text" only for "extract annotations" and when I selected that, nothing seemed to happen.

I am storing files in Zotero (not dropbox). I viewed the file in Zotpad then opened it in Goodnotes, highlighted text in the PDF document using GoodNotes, then exported it back to Zotpad. Then I went to my laptop (macbook pro) and synced Zotero. Then I opened the file, and saw the highlight was there. Then closed the file, and selected it in Zotero and looked for the "extract highlighted text" option. There was none, as above.

What am I missing, please?

adamsmith · February 12, 2014

"extract annotations" does both notes and highlighted text, so looks like you're doing everything right. Did you try this with the beta of ZotFile (that Joscha links to above) or with the regular version (from mozilla add-ons)?

Larisa · February 12, 2014

It says "last updated January 31" although I initially downloaded it at least several months ago.

ETA: and to be clear - when I click "extract" nothing happened - no note was created or anything (I'm not just confused by the lack of reference to highlighting in the menu :) )

adamsmith · February 12, 2014

you could try installing the beta from the link above then (in Joscha's post just above yours).

Larisa · February 12, 2014

OK thanks, do I need to delete anything first or can I just install over what I have now?

adamsmith · February 12, 2014

just install it over the old one

Larisa · February 13, 2014

OK, I installed the beta version. Still nothing happened. (I did check and the doc does have text you can select & copy). Is a note supposed to appear in the "notes" section?

kithairon · February 13, 2014

Thanks for the update: Re-extracted a few of my old extractions from last year: the latest beta seems to handle m-dashes and apostrophies better; underscored passages in pdfs don't appear as underscores in the extracted notes anymore and look now like ordinary highlighted text. Not sure if the latter is intended or desireable. Speed was never an issue in my setup, but it seems more nippy now. That the note's links to the pdf now don't depend on Zotero's report feature anymore is an absolute treat.

adamsmith · February 13, 2014

@Larisa: could you post a sample annotated PDF somewhere we can access it?

Joscha · February 13, 2014

kithairon, thanks for the report. That was exactly what I was looking for. The underline issue is fixed now as well as another small problem. The link above refers to the updated beta now.

Joscha · February 16, 2014

Beta test: Here is an updated version of the beta with some additional fixes and two additional changes. First, the extraction now runs in the background so that you can continue to use Zotero while zotfile is extracting annotations. Second, this version updates the info windows. These changes might introduce some bugs so it would be great if some could try out the beta.

kithairon · February 17, 2014

Tried the latest beta. Interlinear space handling seems better, i.e. less double line breaks. The treatment of the apostrophies has gone back to previous behaviour (no more 6s and 9s, just straight ones). Some superscripted footnote anchors that were picked up previously (~ 6 months ago) don't seem to get picked up in this version. Whenever I open a PDF via the link in the items note, this produces an empty tab in FF.
Appreciate the new background mode and the discreetly animated info window. Let me know if you want something more specifically looked at.

florisvdh · February 17, 2014

Since today I have serious speed issues with the 'rename attachment' functionality of ZotFile. Problem remains even after rebooting the whole system (Win XP). I.e., when I run 'rename' for just one item, it takes about 1 minute to fulfil its job (Zotero is hanging meanwhile). ZotFile is at version 3.1; I use Zotero StandAlone 4.0.17. And my CPU is not occupied by something else or so.

Last week I added several hundreds of pdf attachments, but I doubt that is the reason, as I encounter the same speed issue in another Zotero profile that holds much less pdf files.

Any clues to solve this? I would like to demonstrate Zotero/ZotFile to my colleagues during one of the coming weeks - so I feel a little problem at the moment.