Way(s) to clean up zotero data folder?

My Zotero library is starting to get near the limits of my webdav server. One thing I've noticed is that sometimes there are "extra" PDFs in a zotero folder that aren't in the zotero database. For example, in the zotero interface, Paper A has one PDF attachment, but if you choose "Show file", there will be 2 or 3 in the folder, mostly repeats with different filenames, but of the same file. I have a feeling that there are quite a few of these "dead" files running around my zotero data directory and that they are eating space.

Another thing is that I have 746 records in Zotero, but my zotero data folder has 1486 folders. Are these records I have deleted in the zotero interface (perhaps in previous, pre-trashcan versions)? If so, are these deleted files still there, eating space?

Is there a way to poke around for any files in the Zotero directory that are not "known" by Zotero and remove them?

Thanks :)
CB
  • Another thing is that I have 746 records in Zotero, but my zotero data folder has 1486 folders.
    This is probably due to most items having web snapshots & many also having a PDF.
  • Thanks, but I'm not sure that that is the case here. I'm careful to keep Zotero set to not take snapshots, and I delete them whenever they do sometimes appear. All my articles have exactly 1 PDF in Zotero and no other files associated, except for a very few that have a note. If each record AND each PDF had its own folder, I should have 1492 folders, but I have fewer than that. Hmmmm....
  • if you choose "Show file", there will be 2 or 3 in the folder, mostly repeats with different filenames, but of the same file
    I can't really think of anything in Zotero that would cause this. Have you ever synced your data directory using other tools? Dropbox, etc.?

    Only one file is linked to Zotero for each attachment, so there's no harm in removing non-primary files. (This also means Zotero won't detect any deletions, so you'd have to clear the files from the WebDAV server and Reset File Sync History in the Sync->Reset pane of the Zotero prefs.)
    If each record AND each PDF had its own folder
    Only attachment items have 'storage' folders. You can use an Advanced Search for [Item Type] [is] [Attachment] and Select All to get a count (though that would include linked files and web links as well).
    Is there a way to poke around for any files in the Zotero directory that are not "known" by Zotero and remove them?
    Not really, at the moment. We can probably write a little utility function for this at some point, or somebody could write a plugin to do it.

    Obviously, if you can reproduce the orphaning of a directory in the 'storage' folder, that's a bug. We're not aware of any current bugs of this sort, but it's possible they've existed.
  • Thanks, Dan, as always. The search you suggested produced 720 records, which seems about right for my 746 records (missing a few PDFs). However, I have >1400 folders!

    As for the multiple PDF issue, I've had zotero since an early version, across multiple computers, multiple platforms, multiple webdavs, several newbie screwups, and several system failures, so I'm quite certain it was some wacky thing I did when I was still figuring out how to get it all running smoothly (which it seems to be now). I blame at least some of the duplicate PDF issue on me trying to manually fix duplicate records and update filenames in the days before the "rename file from metadata" option was there, but I really doubt I did that for 700 files.

    Without a way to identify the orphaned files, it's hard to know if it is a zotero bug or just a Dumb Newbie User Error.

    +1 vote for a "clean out storage folder" utility
    +1 vote for someone to write a plug-in!
  • Would one way to fix this be to export my library (including files), create a new Firefox profile, and import my library?

    What data would I lose by doing this (folders? tags? notes?)

    Thanks,
    CB
  • Would one way to fix this be to export my library (including files), create a new Firefox profile, and import my library?
    You don't want to do that. You shouldn't lose data, in theory, but it could potentially cause all sorts of problems (with syncing, word processor documents, etc.).

    For the dupes, one thing you could do is create a smart folder via your OS for PDFs within your 'storage' directory and just go down the list, deleting the obvious dupes. Then do what I said above to trigger a re-upload.
  • Thanks again Dan.

    What about if I reset the webdav sync (ie - overwrote the webdav from my local Zotero), then did the new firefox profile thing, and downloaded everything from the webdav? (making backup copies just in case, of course).

    Would that keep the references in my Word docs happy? The neat-nick in me would really like to clean up everything, everywhere.
  • No need for a new profile.

    First, make a backup copy of your Zotero data directory.

    Then do the smart folder thing above (or find and remove the dupes manually), since that's the only way to take care of unwanted files in valid directories.

    Next, delete all files off your WebDAV server, Reset File Sync History, and do a sync. Zotero will sync only the folders it knows about.

    Then delete the 'storage' folder within the data directory. Leave the database and all other files intact. Sync, and Zotero should pull all the files back down.

    (Alternatively, to avoid the extra download, you could just compare the files on the WebDAV server with the local folders and delete the extra local ones.)
  • Thanks Dan :) I'll wait until my current project is finished this weekend, and then do that.
Sign In or Register to comment.