ZotFile - Advanced PDF management for Zotero

bwiernik · February 17, 2014

In the beta, dragging a file from Windows Explorer to an item renames it per the renaming rules, even if the item has an extension not on the specified list (e.g. .xpi, .zip, .jar).

Joscha · February 18, 2014

Thanks, kithairon! Let me know if you have a pdf that you think is worse compared to before and I can check it out.

MILKLIMuser, it's hard to say anything without more information. Do you "Attach stored copy of file(s)" or use a "custom location" (zotfile preferences)? The other thing that might matter is whether you change a file from a stored copy to a custom location, from a custom location to a custom location etc? Did you change any settings before zotfile started to be slow? In my experience, linked attachments are slower. Is it equally slow when you add an attachments through Zotero not using zotfile?

bwiernik, fixed (but I didn't upload a new beta).

Joscha · February 18, 2014

MILKLIMuser, I assume your problem is related to the fact that Zotero reindexes the pdf whenever you rename an attachment with zotfile. Maybe you turned on indexing recently and that explains the change in performance. The reindexing only happens when you rename linked attachments (also when you change from imported to linked or linked to imported). I fixed that in the new version of zotfile. The link to the beta is below. Please try this version and let me know whether it solves the problem. I rewrote the whole renaming function, which was long overdue, so I hope I didn't introduce any bugs.

Beta test: Here is the current zotfile beta with improved extraction of pdf annotations and improved renaming of files. If you test this version, please let me know how the extraction compares to pervious version and whether the renaming works smoothly.

florisvdh · February 18, 2014

Hi Joscha,
Thanks for the suggestions. First I must clarify quite some things indeed:

- in the end, I want to have everything as linked files on a network drive. I have 165 such items now in 'good shape', renamed etc. This number will become 'thousands' eventually, as I collect papers from a whole team.

- for the previous reason, I do not use Zotero to sync files, I only sync items.

- I had already installed the PDF indexing plugins (pdfinfo/pdftotext) to 'retrieve metadata from pdf' earlier on (so that may entail interference with indexing setting, I don't know). At least that is true for my Windows machine at work - not for the Linux machine at home. This may hold a clue to explain 'A' below, after reading your last post.

- I still have 'old' pdf collections of > 1000 papers, simply in folders on my hard drive:
A. I have recently been dragging these into Zotero (now just as attachment; eventually to get them moved and linked). That went very slow on my (more recent) Windows XP quad core Pentium Core i5 machine (at work) - about 2.5 pdfs per minute - but fast on a single core Pentium IV Linux (:-D at home) - about 25 pdfs per minute I estimate.
A smaller part was done on the Win machine and the rest on the Linux machine, so for now I cannot open all of them from either machine until after I have moved and linked them. (as I sync only the items through zotero, not files)

B. on the Windows machine (at work) I have retrieved metadata for the pdfs that I dragged in there. In itself that went quite fast.
[I first have to do extra work (especially tagging and adding notes) on all these items before I will convert them into linked, renamed files. So I keep these locally for now.]

C. it was only after steps A and B that I used Zotfile again (at work) for some (new) items and noticed the slow behaviour of renaming and moving. I have not changed settings either in Zotero or ZotFile during the whole process. So before, linking and renaming went fast!

- meanwhile, I have noticed that zotero.sqlite of this Windows installation is 10 times larger than on my synced (!) Linux version (i.e. 4 vs. 40 MB). Also a freshly installed sync on another Windows machine is only 4 MB.

- in those synced Zotero installations where no metadata retrieval has (yet) taken place for single pdf files (if that's the cause), ZotFile renaming still goes very fast. I might test whether retrieving pdf metadata has the same effect there on speed and on size of zotero.sqlite. Or it might be the actual installation of the PDF indexing tools that is the problem.

- For your reference, I had a look into the Windows version where I did the metadata retrieval and I see 'Indexed: 421, Partial: 6, Unindexed: 620, Words: 120677'. In the synced installation on a second Windows machine where the PDF indexing plugins are not installed, I read zeroes except for the Unindexed (1047).

After reading your response, some thinking and surfing the web, I see myself faced with four options which may help and/or solve problems:

1. use Zotfile 3.2 as you suggest and see whether that solves problems. Btw, can I upgrade an already installed ZotFile 3.1 to the 3.2 beta (using the xpi file) without losing ZotFile settings?

2. try to get a leaner zotero.sqlite file again, by doing a restore from the server, as I suspect it from being unnecessarily bloated due to the history it underwent. However I don't know (yet) what that will do with the pdf files that are still locally stored.

3. Install the zotero-auto-index plugin (https://github.com/friflaj/zotero-auto-index); perhaps I should do that anyway.

4. Clear the Index in Zotero preferences?

As you may now see what will be the best to do, I'll wait for your ideas before doing 'trial and error'. Thanks again! I see my report is quite long, but also rather complete, I hope.

adamsmith · February 18, 2014

Giving this a quick read it sounds like what's hanging is indexing, not ZotFile. Instead of the indexing add-on, I'd recommend you install the Zotero beta version, which contains a re-write of the indexing code that speeds up indexing by several orders of magnitude.
http://www.zotero.org/support/dev_builds#zotero_40_beta

Joscha · February 18, 2014

Yes, the Zotero beta will make a large difference. Zotfile 3.1 still causes Zotero to reindex for certain operations (e.g. renaming linked attachments). The Zotfile 3.1 beta should solve that problem as well. So I suggest that you first try out the Zotero beta and then the Zotfile beta if you still feel like it.

florisvdh · February 18, 2014

OK, can you help me with something I couldn't find out: how can I use an xpi file to upgrade Zotero Standalone to the beta version? If not possible, I guess for now I could also remove the pdftotext and pdfinfo files and rebuild or clear the index, in order to speed up ZotFile.

adamsmith · February 18, 2014

there's no beta for Standalone.

bwiernik · February 18, 2014

In the latest beta, the Watch folder and Attach New File functions don't work. Attach new file always returns the error "Selected item is an attachment, a note, or a collection".

florisvdh · February 19, 2014

Joscha & adamsmith: I have used my proposed workaround:
- clear the index. This reduced the zotero.sqlite file back to 4 MB
- I have removed the pdf indexing tools. Importing pdfs and ZotFile-renaming is fast again! I will use the pdf indexing tools only temporarily now (to retrieve metadata).

Thank you for the quick responses and I look forward to the next releases.

Joscha · February 19, 2014

bwiernik, I can't reproduce this. It's working fine for me. Can you be more specific?

(zotfile incorrectly showed the "Selected item is an attachment, a note, or a collection" error message when it did not find any files in the source folder. That is fixed now but attaching new files and watch folder stuff works for me)

bwiernik · February 19, 2014

Joscha, after a few restarts of Firefox, the issue went away. Not sure what was happening.

Peterfoster · February 19, 2014

I've got my library setup with Zotfile linking the default Zotero storage path to a Dropbox folder, and I also have a webdav backup for syncing purposes. I may now need to migrate off of Dropbox... What would be a safe way to migrate my linked PDF library from one cloud storage service to another? (sorry if this question has already been raised, this is a long thread and I couldn't find an answer already posted)

adamsmith · February 19, 2014

I'm a little confused how you have both Dropbox and WebDAV, but be that as it may:
You can point ZotFile to the new location and then batch-rename your attachments, which will also move them to the new location.

florisvdh · February 20, 2014

I am testing Zotfile 3.2b4 now as I find it such a great tool! Does the changelog "Fix bug with unnecessary suffix after multiple renames of same file" mean that duplicate files with suffixes (2, 3,...) are avoided when linking and renaming files, i.e. when the file was already there? I observe the following with the beta version (as was the case in 3.1):
1. In the case of a shared network drive where colleagues put their papers as well, unnecessary duplicates now arise with suffixes 2, 3 etc. Ideally it would be checked whether the file already exists and that one is linked instead.
2. When issuing 'rename' on an item that was already 'renamed' with a suffix 2, suffix 3 is used instead, and reverting to suffix 2 the next time, etc. This is not the case for files that received no suffix the first time.

drrwgrant · February 20, 2014

Joscha, congratulations and thanks for building such a fantastic tool! If there were any justice you'd be the one on the end of a few Facebook $ for this.
I should have registered long ago just to say this but of course I have a question for you or others...
The coloured tab functionality with _tablet and _tablet_modified is fantastic. Now I can see at a glance what is on the tablet and whether I've annotated it yet.
However, what is still taking me a lot of time is searching through Zotero records to distinguish which ones haven't yet gone to the tablet from those which have returned.
Is there any way to mark the ones that have been returned, for example by tagging everything with extracted annotations? Could a future release have a _tablet_been_there tag? Or is there a nice workflow fix for this that others use to keep track of things?

adamsmith · February 20, 2014

Independent of Zotfile you can set up a saved search with something like:
Note --> does not contain --> Extracted annotations
check the include parent items checkbox.

Joscha · February 20, 2014

MILKLIMuser, zotfile only avoids the suffix if you rename the same file and it already has the correct name. I am still still using suffixes when there are multiple copies of the same file (e.g. because of multiple attachments for the same item). That is intended behavior.

drrwgrant, I wish... ;) There is a hidden option, which should do exactly what you are looking for. Just go to 'about:config' and set 'zotfile.tablet.tagParentPull' to true and define the tag you want to use in 'tagParentPull_tag'. That should do the trick. By the way, you should not use the keyboard shortcuts to add and remove the tablet tags (it's find for the tagParentPull tag though).

drrwgrant · February 20, 2014

adamsmith - thanks! That so nearly works. Your suggestion gives me all the Zotero items with extracted notes in a saved search 'notes_done'.
But when I create another search to get what I want by doing Saved Search --> is not --> notes_done, instead of getting the compliment of the first set of results I get everything. I clearly need to learn how searches work, including the 'parent and child' options.

Joscha - well I did give you a tiny contribution but I don't quite have Zuckerberg's pockets, sorry.
Thanks for the tip on hidden options, I'll give it a go. Understood about not adding the _tablet(_modified) tags manually, so far the automatic tagging has worked smoothly for me in any case.

adamsmith · February 20, 2014

I think
note --> does not contain --> Extracted annotations
with "include parents" and "show only top level items" should do what you want.

drrwgrant · February 20, 2014

adamsmith - yes! That is just right, thanks. Adding another condition for attachment type pdf and the list is exactly what I needed. I'll try Joscha's tag instructions too but this is much appreciated.

bwiernik · February 21, 2014

Joscha, I'd like to add functionality to have separate suffixes to the filenames for multiple attachments to an item. I think an effective way would be to pull tags from the attachment items. I'm not sure where the best place in the renaming process to put this, though. Any thoughts?

Joscha · February 22, 2014

Currently, zotfile checks whether a file with the same filename exists and adds a suffix in the moveFile function. So the filename is first generated based on metadata and the suffix is only added when the files is moved. You might just want to focus on the getFilename function. I am not 100% sure what your goal is but you could add something to the end of the 'if (!this.prefs.getBoolPref("useZoteroToRename"))' part, which checks whether the zotero item has other attachments and then do your changes.

bwiernik · February 23, 2014

The goal is to be able to add special labels to files, like "Appendix" or "Online Supplement". In general, I'd rather have these materials attached to the same Zotero item than to make a second item and relate it. I figured the only real way to create these by-attachment labels is to use tags. I'll check it out there.

As an aside, when files aren't moved (only renamed), the duplicate suffix isn't appended, leaving multiple attachments with the same name.

Joscha · February 23, 2014

For all users of the beta: Please deinstall the current version and go back to the main release of zotfile. A recent change I made is problematic for some users because it changes the linkMode of attachments. Here are more details:

https://forums.zotero.org/discussion/35046/

You should only be affected if you change linked to imported attachments or the other way around. Even if you don't, I recommend that you change back to the 3.1 until the problem is addressed. Sorry for the trouble!

Joscha · February 24, 2014

bwiernik, there is a ticket on github for this and I agree that it would be nice. Let us move this discussion to github.

ZotFile beta: Here is a new version that fixes the problem and reverts some of the changes. Changing from linked to imported attachments or vice versa recreates the attachment and indexes it again, which might cause delays (significantly reduced in a future Zotero version though). The Zotero API doesn't offer a way around it. The slow-down does not occur when you rename etc without changing the attachment type.

Sorry that the last beta caused problems. Here is what you can do if you can't sync and get the message "Error processing uploaded data": Create a saved search that includes all attachment modified in the last 7 days, reattach these attachments, and empty the trash. You can easily reattach attachments using zotfile (version 3.1 or the beta in this post) by changing the attachment type (temporarily change to linked attachment if you are using important or the other way around).

paulma · February 25, 2014

hi Joscha,
I am trying to find out if there is functionality to automatically create subfolders by the collection\subcollection's name to help me mirror the filing directory of Zotero on Windows Explorer.

I recall searching for a comment a few years ago saying this was not possible. Is there any workaround?

My primary objective is to be able to have a clean file structure so that I can browse the pdfs directory outside of Zotero (when i'm on the go through Dropbox on my phone etc)

dstillman · February 25, 2014

paulma: Collections aren't folders, though. Items can exist in multiple collections, so either the files would need to be duplicated on disk or they'd have to use symlinks/hard links/aliases, which vary by platform (and I don't think Mozilla-based code can even create currently).

paulma · February 25, 2014

ok that's too bad. I've setup my collections on Zotero to have mutually exclusive files.

Philipp Rommel · March 18, 2014

Hi,

is it be possible to implement highlight/note extraction based on color? I use different colors to distinguish e.g. important background information (green) from most important points (red), etc. So it would be really cool to generate separate notes for each of this categories, meaning to have all green highlights/notes separated from e.g. red highlights/notes.

Is this technically possible? THX.

Best,

Philipp