Deleting duplicate files in Zotero storage folder

edited September 26, 2018
I have a lot of attachments all occupying an individual folder with a cryptic name (eg, C6BZIRYB). They belong to a group that is stored in Zotero storage online and used by a few people. The problem is that many files are duplicates (or triplicaets, etc.) that link to no reference in the client interface.

Why do they occur? (can deleting a group online leave those files? they often have different names)

How can one remove or identify them in bulk (happy to use API if needed)?

Is it possible to know if a cryptic folder name or the PDF file within it link to an existing reference, and to which one?

I am using 5.0.55

Thank you.
  • Yes, they're likely from a deleted group due to a (since fixed) Zotero bug. Zotero has said that they'll include clean-up code in a future version (but no specific ETA) but you should also be able to use this (third party) add-on for the purpose: https://github.com/retorquere/zotero-storage-scanner
  • edited September 27, 2018
    Thank you for this advice. I have run the plugin and the results are rather off-putting. In the database of some 3000 entries, there are some 400 problems.

    Results Sample

    Broken attachments
    Duplicate attachments

    And I am not sure the plugin is identifying the orphans. That is the files in storage that are not associated with a reference and do not appear in the client. This was the main problem. Or am I missing something?

    This does not seem like it can be the result of group deletions, which might have been done twice, with relatively small numbers of references in those groups.

    Any further advice? If this is a common effect, a large group with a reasonable number of members and references will get clogged very quickly. This result is only after a month or two of use.
  • Yeah, I'm not sure that's what that plugin does (though I could be wrong).

    If you're comfortable running Perl or Python scripts, there are some third-party scripts that can delete orphaned folders in this thread:

    https://forums.zotero.org/discussion/9091/orphaned-attachment-files

    As adamsmith says, we're planning to add functionality to clean up orphaned files automatically in a future version.
  • The plugin doesn't identify orphans; it just goes through the attachments known to Zotero (which excludes the orphans), labels them as broken if the file they point to does not exist, and as duplicate if the item they are attached to has more than one attachment of the same mimetype. It's pretty limited in what it does.
  • (sorry about that, I misremembered)
  • No worries, I had to look it up myself. The plugin was initially a one-off, I needed to clean up my library and I haven't used it since (this was the 4.0 days). I've ported it because someone asked and it was easy to do. I couldn't imagine anyone but me that once ever using it, but here we are.
  • Has this code ever been added? If so, where is it?
  • Has this code ever been added?
Sign In or Register to comment.