delete duplicate pdf items (without citations nor merge)
Hello,
Have been looking for this for a while and have been unable to find it. I want to delete duplicate .pdf files in my zotero library. These files _do not_ have parent citations or anything as such, they are just pdf files that are waiting for me to manually input their parent or have failed a metadata retrieval. I know there are duplicates, but they do not normally show up or are found easily (i.e. same name, etc.).
I will add my thoughts: I have a duplicate finder that I ran on a copy of my library and it shows me a lot of duplicate files that are taking up a lot of space in my library (and they are duplicates, not different versions of the same lecture i.e.). I only need one copy.
Can this be done? How?
thanks a lot.
Have been looking for this for a while and have been unable to find it. I want to delete duplicate .pdf files in my zotero library. These files _do not_ have parent citations or anything as such, they are just pdf files that are waiting for me to manually input their parent or have failed a metadata retrieval. I know there are duplicates, but they do not normally show up or are found easily (i.e. same name, etc.).
I will add my thoughts: I have a duplicate finder that I ran on a copy of my library and it shows me a lot of duplicate files that are taking up a lot of space in my library (and they are duplicates, not different versions of the same lecture i.e.). I only need one copy.
Can this be done? How?
thanks a lot.
Theoretically I guess Zotero could run a search and compare file hashsums or the like, but it would be a lot of effort for something relatively rare. I don't think many people have that many orphaned pdfs.
The idea of looking for them with a duplicate delete software wasn't that good, since I have shared libraries and deleted a few of the files in the Zotero storage drive because of that. Now I am having troubles syncing with Papership :S
Anyhow, I devised a less than perfect way which consists on making a custom search showing all child pdf items on all entries and scrolled down to delete the duplicates...
I have lots of orphaned pdfs, which is sub-optimal, but it is the only way i've found to keep my library in one place, in sync with my tablet. Have to add them manually I guess....
I just joined zotero and learning the ropes, but found very easy solution to find and delete any duplicates (based on md5 hashes). I did it on Mac, so not sure how to do it on PC.
1. Install fdupes on Mac OSX:
http://macappstore.org/fdupes/
2. Run in terminal with different options for different functions (make sure to put you zotero folder after the option)
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
Enjoy!
bs
This procedure will also delete the file for PDFs for the same item that are in different Group Libraries, which is probably not what you want.
thanks for your comment, I did not realize this.
So the only way is to remove pdfs one by one from each item?
I now realise after trawling through the many cryptic replies on this subject, that paid storage only applies to attachments and not reference text.
Correct?
Today I receive an email stating: "If your storage usage currently exceeds the free limit of 300 MB, your account is now locked. A locked account will not allow any syncing of new files. If you fail to renew within the next thirty days, stored files in excess of the free limit will be deleted, in accordance with our terms of service ".
To someone who hasn't trawled your forums this would mean their account is going to be locked and they won't be able to access anything in it.
Is a file an attachment or is it a reference? I now know it is an attachment, but I haven't been involved in developing this application so it is not as clear to me and others as it is to you.
Not knowing this has meant I have paid $20 a year for storage for the past 5 years.
This is also explained in what I'd have thought were pretty clear terms in the documentation: https://www.zotero.org/support/sync
The forums are definitely not the best place to learn about this -- they mainly provide answers to very specific questions that people ask. If the question asked isn't exactly what you're looking for (as would seem to be the case here)
Also, this just seems to be a hard concept to convey. E.g. I'd have thought that "File" is a perfectly clear description, whereas to me "Reference" has no clear meaning and "attachment" is confusing because notes are attached but are not files (i.e. sync for free).