Library: scan for missing attachments?
Due to an (external) sync operation gone awry, I'm missing a few file attachments. I have a backup, but since the files were renamed in Zotero I cannot simply restore that backup. In order to fix this, it would help if Zotero could tell me which attachments it is missing. Is there a way to find out?
Also, the error message when double-clicking a missing attachment could be more helpful. Right now it says 'The attached file could not be found' and offers me to 'locate' the file. But it doesn't even tell me where it was looking (e.g. 2IAEJ6D7) — information that is crucial to find the file in a backup.
What I suggest is cleanup options like the following:
* A way to display orphaned /storage/ subfolders
* A way to display entries with missing attachments (including the expected location of these attachments)
* A way to display unlinked files in the /storage/ folders of (non-HTML) attachments (this would make visible accidentally duplicated PDFs)
I have problems with missing attachments (failed sync, currently not reproducable) as well and I found that in all these cases the entire storage directory for that attachment is missing.
based on this observation, for those more technically minded, some hints toward a workaround: from the zotero.sqlite database, the SQL query "SELECT items.key FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" returns the names of the sub directories of the storage directory.
with a scripting language of your choice (on one computer I used php, on another a strange Excel/batch file combination) check for the existence of all these directories to produce a list of missing attachments. Given that there is a chance (depending on how your attachments went missing) that the directory is present but the attachment missing, the list might not be complete. But certainly a start.
P.S. This query "SELECT REPLACE(itemAttachments.path, "storage:", items.key || '/') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" gets you combined path and filename, but (at least for me) struggles to properly return non ASCII characters in the filename
However, I think it would not be inappropriate to expect to receive some sort of error message/warning if an attachment sync fails for this sort of reason. I think this should be an inclusion on the sync error handling and receive a much higher priority than the tool discussed here.
Still. It would be good to have a tool that would check for missing attachments. I, for example, change computers quite often and thus really rely on properly functioning syncronisation or else would have to make sure that I manually copy over zotero profiles rather than use the sync feature. Having a tool that can check for the success of attachment sync would thus be a great feature. (an error message still preferable, especially since it could provide more detailed information, e.g. hint towards corrupt zip file)
I might create such a tool, but that would probably end up being a windows .exe since I haven't done any firefox extension programming yet... But maybe that's a good reason to start, as I have a bunch of other easy to implement and (I think) useful ideas for zotero extensions...
Following up on this thread and offering a relatively straightforward work around, below is a relatively easy way to find the storage directories that should be there but (ie there are records for it in the zotero database), for some reason, are not (in my case that was mainly due to data corruption of the zip files on the WebDAV server, so zotero was really not able to download the files; although an error message would have been appreciated).
Alright, the first thing you need is a command line sqlite client, which you can get form the download page of the sqlite project (scroll down to "Precompiled Binaries For Windows" and select the zip file with "A command-line program...."). Unzip that file to your zotero storage folder.
The second step is to copy the zotero.sqlite database file to that same storage folder as well.
Third step is to create a batch file in that same folder and copy the batch script at the bottom of this comment (or, if you can't be bothered, you can download it from my site and unzip it to your storage folder).
Fourth and final step is to run that batch file (ideally from the command prompt), which will query (read only) your zotero database and list all storage folders that don't exist. It is then up to you to investigate and fix what happened.
(I put a package containing the command line client from step 1 and the batch file from step 3 in a single zip archive which you can download from my site)
[NB: the sqlite command line client that I refer to and the batch code below is obviously only for windows but can be adopted very easy for scripting languages for MacOS and Linux as well; just adjust the text bits in the SQL query to produce to appropriate syntax for your OS]
Hope this might be useful for somebody,
krueschan
--------
@echo off
echo @echo off > test2.bat
sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST ' || items.key || '\NUL ECHO ' || items.key || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> test2.bat
test2.bat
del test2.bat
In order to use that, please follow steps 1 and 2 from above instructions, and for step 3 use the code from the bottom of this post here (or download the batch file).
This should list all the missing attachment files that are supposed to be there according to zotero DB but are missing on your disk.
There is one (and a half) problem though, but I give up at this point as this only affects my very very marginally: I cannot figure out how to pass non ASCII characters from the zotero DB to the batch file, with the result that all files with unicode file names are reported missing and thus require manual follow up.
The other half problem is that filenames with special Windows characters (especially the % sign) will cause the program to terminate with an error message. If that happens, search for % in the test3.bat file that is created in your zotero storage folder, remove the % and run test3.bat again (ideally from the command line).
Any feedback is welcome
krueschan
----------
@echo off
echo @echo off > test3.bat
sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST "^""' || REPLACE(itemAttachments.path, 'storage:', items.key || '/') || '"^"" ECHO ' || REPLACE(itemAttachments.path, 'storage:', items.key || '-') || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> test3.bat
test3.bat
del test3.bat
I find it easier to have text record of all the missing files so I adjusted your batch code as follows:
@echo off
echo @echo off > missingfiles.bat
sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST "^""' || REPLACE(itemAttachments.path, 'storage:', items.key || '/') || '"^"" ECHO ' || REPLACE(itemAttachments.path, 'storage:', items.key || '-') || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> missingfiles.bat
missingfiles.bat > missingfiles.txt
del missingfiles.bat
Unfortunately, the Unicode character issue is a problem for me — I got loads of German and French sources, and anything outside the ASCII range seems to go awry.
Oh well, it's a great start!
Thanks in advance.
After that, I'm at a loss. I'm assuming you have to have the zotero firefox extension installed. But after installing the zotero firefox extension and pointing it to my zotero library, I don't see any "#broken" tags.
Emilia, can you help?
Do you have any indication that you have broken attachment links?
Sorry, Emiliano!! Yes, I know I have some missing links. I started using PaperShip on my iPad, but some of the syncing gets hung up on files-not-found. So I inspected my library and found hundreds of missing files. Not sure what happened; guessing I somehow screwed up a Mac->PC migration and hadn't been syncing regularly. Something. I don't want to hunt them down manually.
Thanks!
To just identify missing attachments, it's probably easiest to follow fbennett's advice from October 3 (i.e. sort by attachments and see the empty/hollow dot
https://gist.github.com/pkienzle/ea9c15362abe3f63773a
It handles Unicode in filenames.
Only tested on Mac. In principle it should work on any unix or windows box with python installed, but I didn't bother with the logic of searching the default locations for the zotero database on these systems.