delete duplicate pdf items (without citations nor merge)

Hello,
Have been looking for this for a while and have been unable to find it. I want to delete duplicate .pdf files in my zotero library. These files _do not_ have parent citations or anything as such, they are just pdf files that are waiting for me to manually input their parent or have failed a metadata retrieval. I know there are duplicates, but they do not normally show up or are found easily (i.e. same name, etc.).
I will add my thoughts: I have a duplicate finder that I ran on a copy of my library and it shows me a lot of duplicate files that are taking up a lot of space in my library (and they are duplicates, not different versions of the same lecture i.e.). I only need one copy.
Can this be done? How?
thanks a lot.
  • No, there's no easy way to do that, sorry.
    Theoretically I guess Zotero could run a search and compare file hashsums or the like, but it would be a lot of effort for something relatively rare. I don't think many people have that many orphaned pdfs.
  • Thanks for the responses,
    The idea of looking for them with a duplicate delete software wasn't that good, since I have shared libraries and deleted a few of the files in the Zotero storage drive because of that. Now I am having troubles syncing with Papership :S
    Anyhow, I devised a less than perfect way which consists on making a custom search showing all child pdf items on all entries and scrolled down to delete the duplicates...
    I have lots of orphaned pdfs, which is sub-optimal, but it is the only way i've found to keep my library in one place, in sync with my tablet. Have to add them manually I guess....
  • edited October 19, 2017
    mnlngl,

    I just joined zotero and learning the ropes, but found very easy solution to find and delete any duplicates (based on md5 hashes). I did it on Mac, so not sure how to do it on PC.

    1. Install fdupes on Mac OSX:
    http://macappstore.org/fdupes/

    2. Run in terminal with different options for different functions (make sure to put you zotero folder after the option)

    fdupes -r .
    fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.

    Enjoy!
    bs



  • Note though that this will only delete the file from your computer, not remove duplicate attachment items from the Zotero interface.

    This procedure will also delete the file for PDFs for the same item that are in different Group Libraries, which is probably not what you want.
  • edited October 19, 2017
    bwiernik,

    thanks for your comment, I did not realize this.
    So the only way is to remove pdfs one by one from each item?

  • Unfortunately at this time yes.
  • Is this because people are less likely to pay for storage?
  • Not sure what you're getting at, but this has nothing to do with online file storage.
  • Hi Adam,

    I now realise after trawling through the many cryptic replies on this subject, that paid storage only applies to attachments and not reference text.

    Correct?
  • It would save Zotero a lot of time and effort if you made this more explicit.

    Today I receive an email stating: "If your storage usage currently exceeds the free limit of 300 MB, your account is now locked. A locked account will not allow any syncing of new files. If you fail to renew within the next thirty days, stored files in excess of the free limit will be deleted, in accordance with our terms of service ".

    To someone who hasn't trawled your forums this would mean their account is going to be locked and they won't be able to access anything in it.

    Is a file an attachment or is it a reference? I now know it is an attachment, but I haven't been involved in developing this application so it is not as clear to me and others as it is to you.

    Not knowing this has meant I have paid $20 a year for storage for the past 5 years.
  • I thought that message said that "metadata will continue to sync" -- that's definitely what the in-app message says; the e-mail should definitely be changed to say that to, if that's not the case.

    This is also explained in what I'd have thought were pretty clear terms in the documentation: https://www.zotero.org/support/sync

    The forums are definitely not the best place to learn about this -- they mainly provide answers to very specific questions that people ask. If the question asked isn't exactly what you're looking for (as would seem to be the case here)

    Also, this just seems to be a hard concept to convey. E.g. I'd have thought that "File" is a perfectly clear description, whereas to me "Reference" has no clear meaning and "attachment" is confusing because notes are attached but are not files (i.e. sync for free).
  • Yes, this is the in-app message when you reach your quota, which is the first thing you'd see about this before you purchased a subscription:
    You have reached your Zotero File Storage quota. Some files were not uploaded. Other Zotero data will continue to sync to the server.

    See your zotero.org account settings for additional storage options.
    And I think the linked sync documentation explains the difference between data syncing and file syncing pretty clearly. This email is only something you'd get after at least a year of having a storage subscription. But we'll try to clarify these expiration emails.
Sign In or Register to comment.