ZotFile - Advanced PDF management for Zotero
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
This discussion has been closed.
MILKLIMuser, it's hard to say anything without more information. Do you "Attach stored copy of file(s)" or use a "custom location" (zotfile preferences)? The other thing that might matter is whether you change a file from a stored copy to a custom location, from a custom location to a custom location etc? Did you change any settings before zotfile started to be slow? In my experience, linked attachments are slower. Is it equally slow when you add an attachments through Zotero not using zotfile?
bwiernik, fixed (but I didn't upload a new beta).
Beta test: Here is the current zotfile beta with improved extraction of pdf annotations and improved renaming of files. If you test this version, please let me know how the extraction compares to pervious version and whether the renaming works smoothly.
Thanks for the suggestions. First I must clarify quite some things indeed:
- in the end, I want to have everything as linked files on a network drive. I have 165 such items now in 'good shape', renamed etc. This number will become 'thousands' eventually, as I collect papers from a whole team.
- for the previous reason, I do not use Zotero to sync files, I only sync items.
- I had already installed the PDF indexing plugins (pdfinfo/pdftotext) to 'retrieve metadata from pdf' earlier on (so that may entail interference with indexing setting, I don't know). At least that is true for my Windows machine at work - not for the Linux machine at home. This may hold a clue to explain 'A' below, after reading your last post.
- I still have 'old' pdf collections of > 1000 papers, simply in folders on my hard drive:
A. I have recently been dragging these into Zotero (now just as attachment; eventually to get them moved and linked). That went very slow on my (more recent) Windows XP quad core Pentium Core i5 machine (at work) - about 2.5 pdfs per minute - but fast on a single core Pentium IV Linux (:-D at home) - about 25 pdfs per minute I estimate.
A smaller part was done on the Win machine and the rest on the Linux machine, so for now I cannot open all of them from either machine until after I have moved and linked them. (as I sync only the items through zotero, not files)
B. on the Windows machine (at work) I have retrieved metadata for the pdfs that I dragged in there. In itself that went quite fast.
[I first have to do extra work (especially tagging and adding notes) on all these items before I will convert them into linked, renamed files. So I keep these locally for now.]
C. it was only after steps A and B that I used Zotfile again (at work) for some (new) items and noticed the slow behaviour of renaming and moving. I have not changed settings either in Zotero or ZotFile during the whole process. So before, linking and renaming went fast!
- meanwhile, I have noticed that zotero.sqlite of this Windows installation is 10 times larger than on my synced (!) Linux version (i.e. 4 vs. 40 MB). Also a freshly installed sync on another Windows machine is only 4 MB.
- in those synced Zotero installations where no metadata retrieval has (yet) taken place for single pdf files (if that's the cause), ZotFile renaming still goes very fast. I might test whether retrieving pdf metadata has the same effect there on speed and on size of zotero.sqlite. Or it might be the actual installation of the PDF indexing tools that is the problem.
- For your reference, I had a look into the Windows version where I did the metadata retrieval and I see 'Indexed: 421, Partial: 6, Unindexed: 620, Words: 120677'. In the synced installation on a second Windows machine where the PDF indexing plugins are not installed, I read zeroes except for the Unindexed (1047).
After reading your response, some thinking and surfing the web, I see myself faced with four options which may help and/or solve problems:
1. use Zotfile 3.2 as you suggest and see whether that solves problems. Btw, can I upgrade an already installed ZotFile 3.1 to the 3.2 beta (using the xpi file) without losing ZotFile settings?
2. try to get a leaner zotero.sqlite file again, by doing a restore from the server, as I suspect it from being unnecessarily bloated due to the history it underwent. However I don't know (yet) what that will do with the pdf files that are still locally stored.
3. Install the zotero-auto-index plugin (https://github.com/friflaj/zotero-auto-index); perhaps I should do that anyway.
4. Clear the Index in Zotero preferences?
As you may now see what will be the best to do, I'll wait for your ideas before doing 'trial and error'. Thanks again! I see my report is quite long, but also rather complete, I hope.
http://www.zotero.org/support/dev_builds#zotero_40_beta
- clear the index. This reduced the zotero.sqlite file back to 4 MB
- I have removed the pdf indexing tools. Importing pdfs and ZotFile-renaming is fast again! I will use the pdf indexing tools only temporarily now (to retrieve metadata).
Thank you for the quick responses and I look forward to the next releases.
(zotfile incorrectly showed the "Selected item is an attachment, a note, or a collection" error message when it did not find any files in the source folder. That is fixed now but attaching new files and watch folder stuff works for me)
You can point ZotFile to the new location and then batch-rename your attachments, which will also move them to the new location.
1. In the case of a shared network drive where colleagues put their papers as well, unnecessary duplicates now arise with suffixes 2, 3 etc. Ideally it would be checked whether the file already exists and that one is linked instead.
2. When issuing 'rename' on an item that was already 'renamed' with a suffix 2, suffix 3 is used instead, and reverting to suffix 2 the next time, etc. This is not the case for files that received no suffix the first time.
I should have registered long ago just to say this but of course I have a question for you or others...
The coloured tab functionality with _tablet and _tablet_modified is fantastic. Now I can see at a glance what is on the tablet and whether I've annotated it yet.
However, what is still taking me a lot of time is searching through Zotero records to distinguish which ones haven't yet gone to the tablet from those which have returned.
Is there any way to mark the ones that have been returned, for example by tagging everything with extracted annotations? Could a future release have a _tablet_been_there tag? Or is there a nice workflow fix for this that others use to keep track of things?
Note --> does not contain --> Extracted annotations
check the include parent items checkbox.
drrwgrant, I wish... ;) There is a hidden option, which should do exactly what you are looking for. Just go to 'about:config' and set 'zotfile.tablet.tagParentPull' to true and define the tag you want to use in 'tagParentPull_tag'. That should do the trick. By the way, you should not use the keyboard shortcuts to add and remove the tablet tags (it's find for the tagParentPull tag though).
But when I create another search to get what I want by doing Saved Search --> is not --> notes_done, instead of getting the compliment of the first set of results I get everything. I clearly need to learn how searches work, including the 'parent and child' options.
Joscha - well I did give you a tiny contribution but I don't quite have Zuckerberg's pockets, sorry.
Thanks for the tip on hidden options, I'll give it a go. Understood about not adding the _tablet(_modified) tags manually, so far the automatic tagging has worked smoothly for me in any case.
note --> does not contain --> Extracted annotations
with "include parents" and "show only top level items" should do what you want.
As an aside, when files aren't moved (only renamed), the duplicate suffix isn't appended, leaving multiple attachments with the same name.
https://forums.zotero.org/discussion/35046/
You should only be affected if you change linked to imported attachments or the other way around. Even if you don't, I recommend that you change back to the 3.1 until the problem is addressed. Sorry for the trouble!
ZotFile beta: Here is a new version that fixes the problem and reverts some of the changes. Changing from linked to imported attachments or vice versa recreates the attachment and indexes it again, which might cause delays (significantly reduced in a future Zotero version though). The Zotero API doesn't offer a way around it. The slow-down does not occur when you rename etc without changing the attachment type.
Sorry that the last beta caused problems. Here is what you can do if you can't sync and get the message "Error processing uploaded data": Create a saved search that includes all attachment modified in the last 7 days, reattach these attachments, and empty the trash. You can easily reattach attachments using zotfile (version 3.1 or the beta in this post) by changing the attachment type (temporarily change to linked attachment if you are using important or the other way around).
I am trying to find out if there is functionality to automatically create subfolders by the collection\subcollection's name to help me mirror the filing directory of Zotero on Windows Explorer.
I recall searching for a comment a few years ago saying this was not possible. Is there any workaround?
My primary objective is to be able to have a clean file structure so that I can browse the pdfs directory outside of Zotero (when i'm on the go through Dropbox on my phone etc)
is it be possible to implement highlight/note extraction based on color? I use different colors to distinguish e.g. important background information (green) from most important points (red), etc. So it would be really cool to generate separate notes for each of this categories, meaning to have all green highlights/notes separated from e.g. red highlights/notes.
Is this technically possible? THX.
Best,
Philipp