Deleting duplicate files in Zotero storage folder

Hi,
I recently upgraded my storage capacity to move from the linked file to the stored file system (in order to annotate PDF on iOS). In the process, when I did "convert linked files to stored files," I mistakenly did not check the box delete the orginal file because I wanted to keep a backup in case something went wrong (I should have backed up my linked file folder instead, that was stupid). Now I have double copies of all stored files for every item. Since I have over 5000 items, I can't possibly manually delete all the redundant files. So instead of taking 6Gb, my stored file library now takes up 12Gb of storage space.

How can I delete the duplicate files when the two are identical? (in some case, I had on purpose more than one attached file to an item, when I keep multiple versions of an article or an appendix file separately from the main article, etc.)

From what I see, each of two (or more) copies of the PDFs is stored in a separate folder in the Zotero file architecture. I tried to use the Attachment Scanner plugin, first scanning the whole library, then turning on the "delete attachment linked to the same file" function and running it on my whole database. The process completed, but it didn't delete the double entries.

Please help!
Thanks in advance!

Maxime

  • The first time I scanned your post I thought you had ended up with one stored copy and one linked copy of each PDF attachment. But reading it again that doesn't seem to be what you are saying. So to confirm exactly what you have now ...

    "Now I have double copies of all stored files for every item" indicates that under each item there are two "stored" copies (*both* under Zotero\storage) of each attachment PDF ? And no linked PDF ?

    Also "Each of two (or more) copies of the PDFs is stored in a separate folder in the Zotero file architecture" also indicates that for each PDF you do now have two "stored" attachments ... of the same file ?

    If so, I am not sure how "Convert linked files to stored files" can end up that way (regardless of how "delete the original file after storing" is set). But if that is indeed what you now have, it obviously needs fixing.

    The usual way to solve these "double copy" situations (which more commonly arise when an import operation is mistakenly run twice) is to sort by date and delete the more recent copies. The twist here is that it is not whole items but just the one (later) version of each stored PDF that is apparently a duplicate copy. And given some uncertainty as to how this happened, the question of preserving annotations that may "belong" to only one (earlier ?) version of each PDF may arise ?

    I will defer the exact best method for achieving the deletion of just one stored PDF in each pair - if that is indeed the real problem - to the devs ... @dstillman.
  • Hi, Sorry for the confusion. Yes, in the end, almost all the files in what used to be my linked folder on Onedrive were deleted (there are a few dozen files left, but they might be remnants left after items were deleted over the years). So my understanding is that there are double stored copies. When I look at a random item and ask Zotero to show me one stored file in the finder, one is in a folder named NNL32NVP, and an identical PDF is in a folder titled 9D5XXTLX.

    The extension Attachment Scanner plugin has correctly added a "#duplicate" tag to the item, but its function to delete the double file doesn't seem to have worked for me.

    You are correct that annotation can be an issue. In some case, the two doubles have the same annotations. But in many cases, it looks like one of the two copies doesn't have the annotations. (the second one that appears under the item in the zotero window).
Sign In or Register to comment.