Script to delete non-indexed content from storage directory

mynameisbond2 · April 25, 2021

This python script is just published on [Github](https://gist.github.com/tigerjack/7fc913434ac7beab6c591c2c1ae92f8d) following a discussion on [reddit](https://www.reddit.com/r/zotero/comments/mwq0q2/delete_nonindexed_content_from_storage_directory/)

It checks the zotero data folder (i.e. all the contents of the storage folder) against those indexed in the zotero.sqlite database. Folders containing unindexed pdf files are moved to a second folder for inspection and then manual deletion.

It basically removes lost unindexed files. You can run it on a copy of your Zotero data folder (below renamed Zotero_copy) to check it first before running it on the main zotero data folder:

myname@homepc:~/bin$ python3 clean_zotero_storage.py -zd /home/myname/Zotero_copy -od /home/myname/duplicates

-zd is the location of your data folder you want to scan, -od is the location of the folder to receive the moved duplicate entries.

I found 23 duplicated unindexed pdfs, probably from 10 years of moving installations and reinstallations. I checked using the right-click show function on those pdfs in Zotero to check that the indexed version was different to the one identified and moved to the duplicate folder.

This could be a very useful tool for some people.

Dellu · August 24, 2021

"Orphan" is the term used in the Bookends system to this kinds of files.

This is very useful script.
Thank u

I have a question: can this be used with Zotfile where the files are linked to a dropbox folder?