attached pdf stored several times in different folders
I don´t know how I managed to produce this, but I found out that many of my attached pdf files are stored several times in the storage folder (blowing up the size of the storage folder). I´m not talking about duplicated database entries - the attached pdf for a single database entry is stored up to four times in the Storage folder (every pdf in a different subfolder). When I use "show file" from within Zotero, I´m directed to one of the four files. I don´t know if the other three are still linked to Zotero in any way.
Does this sound familiar to anyone? Is there a possibility to get rid of the extra files?
Martin
Sorry, I posted it first in General, but I think the 1.5 Beta is the right forum
Does this sound familiar to anyone? Is there a possibility to get rid of the extra files?
Martin
Sorry, I posted it first in General, but I think the 1.5 Beta is the right forum
Let´s say I have this one paper added to Zotero:
Riedl, J. et al., 2008. Lifeact: a versatile marker to visualize F-actin. Nat Meth, 5(7), 605-607.
As an attachment I have saved the pdf file. If I click on "Show File" for the pdf, an explorer windows opens and shows me the pdf stored in this place:
\zotero\storage\RFE3HA5A\Riedl et al.pdf
However, if I search in \zotero\storage\ for Riedl et al.pdf I get the following result:
\zotero\storage\M7F745MP\Riedl et al.pdf
\zotero\storage\RFE3HA5A\Riedl et al.pdf
\zotero\storage\RGTBXTE6\Riedl et al.pdf
\zotero\storage\US3T6TRK\Riedl et al.pdf
That means the pdf is stored 4 times (This is true for many, but not all of my attachments). In Zotero, however, this paper is only entered once.
I hope the problem is clear now. Is there a possibility to delete all the redundant data (automatically)?
Thanks a lot. Martin
I noticed only five duplicated pdfs. I haven't been following this closely, but most seem to be through my fault, rather than Zotero defects. The repeated PDFs are:
from a reference retrieved via UnAPI: three files with the same creation date and name. I see that two are currently attached to the same zotero item & have access dates that are 30 seconds apart (from September of last year). The shows up as being unattached to any item.
two references downloaded via sciencedirect where I do have duplicate items (these were both most likely my fault & I will be deleting one of each of them). One of these references is the only example I have where the same file has a different name.
one reference from Wiley interscience, added just this month that has the same filename & date. I only see one item with one attachment. This was likely a dup that was my fault & I deleted the other one. EDIT: yes, it is still in the trashcan, with the second attachment.
A 'test.pdf' from Oct of last year that I used for testing. This was an item that I manually added twice & did not delete.
In summary: I don't personally have that many dups (1%) & most of my problems seem to exist between the keyboard and chair. This certainly represents an area will the upcoming duplication detection could help. And an explicit filename search could be useful (as has been called for before); searching by title didn't always work, as the filename could be different than the Zotero title for the object.
You might check your trash, to see if the items are there. And also try to search for titles with the name of your filename, to see if you can see multiple items.
I just wanted to know if there is a possibility to get rid of the duplicates without loosing the real data. And without checking manually for every duplicate which of the files is the one used by zotero.
I did check the trash. Empty.
And items are not added to zotero under different names or something like that
A technical solution would be to open the Zotero SQLite database in an SQLite client, generate a list of the keys in the 'items' table, generate a list of the directories in the storage folder, compare the two (e.g., do a diff), and then remove any in the latter that didn't appear in the former.
Did you perform a title search for some of those filenames? For me, many of the duplicated files were hiding a bit...
@asplundj: I still don´t know how I managed to produce the duplicates, but I got rid of them by exporting the library and reimporting it into a fresh firefox profile. The size of the Storage folder shrunk to approx. 1/3 (can´t say exactely, because I used the move to clean up the library from some entries I did not need).