Library: scan for missing attachments?

Due to an (external) sync operation gone awry, I'm missing a few file attachments. I have a backup, but since the files were renamed in Zotero I cannot simply restore that backup. In order to fix this, it would help if Zotero could tell me which attachments it is missing. Is there a way to find out?
  • edited October 28, 2009
    I know a new version of Zotero often produces a folder 'orphaned files'. This would be the other way around: missing files.

    Also, the error message when double-clicking a missing attachment could be more helpful. Right now it says 'The attached file could not be found' and offers me to 'locate' the file. But it doesn't even tell me where it was looking (e.g. 2IAEJ6D7) — information that is crucial to find the file in a backup.
  • Can I bump this? As libraries grow older and larger they accumulate problems with attachments, and cleanup becomes a necessity.

    What I suggest is cleanup options like the following:
    * A way to display orphaned /storage/ subfolders
    * A way to display entries with missing attachments (including the expected location of these attachments)
    * A way to display unlinked files in the /storage/ folders of (non-HTML) attachments (this would make visible accidentally duplicated PDFs)
  • I agree that it would be a great feature to have this built into zotero.

    I have problems with missing attachments (failed sync, currently not reproducable) as well and I found that in all these cases the entire storage directory for that attachment is missing.

    based on this observation, for those more technically minded, some hints toward a workaround: from the zotero.sqlite database, the SQL query "SELECT items.key FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" returns the names of the sub directories of the storage directory.

    with a scripting language of your choice (on one computer I used php, on another a strange Excel/batch file combination) check for the existence of all these directories to produce a list of missing attachments. Given that there is a chance (depending on how your attachments went missing) that the directory is present but the attachment missing, the list might not be complete. But certainly a start.



    P.S. This query "SELECT REPLACE(itemAttachments.path, "storage:", items.key || '/') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" gets you combined path and filename, but (at least for me) struggles to properly return non ASCII characters in the filename
  • Ticket created, but there's no reason someone from the community couldn't create such a tool. If someone's interested, take the ticket.
    I have problems with missing attachments (failed sync, currently not reproducable) as well and I found that in all these cases the entire storage directory for that attachment is missing.
    Off-topic, but did you try Reset File Sync History?
  • In my case, I found the reason for the sync error (at least in one instance; I haven't had time yet to check the others). Turns out that the zip file created during the upload is corrupt on the WebDAV server. Not surprising then, that the sync failed.

    However, I think it would not be inappropriate to expect to receive some sort of error message/warning if an attachment sync fails for this sort of reason. I think this should be an inclusion on the sync error handling and receive a much higher priority than the tool discussed here.

    Still. It would be good to have a tool that would check for missing attachments. I, for example, change computers quite often and thus really rely on properly functioning syncronisation or else would have to make sure that I manually copy over zotero profiles rather than use the sync feature. Having a tool that can check for the success of attachment sync would thus be a great feature. (an error message still preferable, especially since it could provide more detailed information, e.g. hint towards corrupt zip file)

    I might create such a tool, but that would probably end up being a windows .exe since I haven't done any firefox extension programming yet... But maybe that's a good reason to start, as I have a bunch of other easy to implement and (I think) useful ideas for zotero extensions...
  • edited May 26, 2010
    Hey,

    Following up on this thread and offering a relatively straightforward work around, below is a relatively easy way to find the storage directories that should be there but (ie there are records for it in the zotero database), for some reason, are not (in my case that was mainly due to data corruption of the zip files on the WebDAV server, so zotero was really not able to download the files; although an error message would have been appreciated).

    Alright, the first thing you need is a command line sqlite client, which you can get form the download page of the sqlite project (scroll down to "Precompiled Binaries For Windows" and select the zip file with "A command-line program...."). Unzip that file to your zotero storage folder.

    The second step is to copy the zotero.sqlite database file to that same storage folder as well.

    Third step is to create a batch file in that same folder and copy the batch script at the bottom of this comment (or, if you can't be bothered, you can download it from my site and unzip it to your storage folder).

    Fourth and final step is to run that batch file (ideally from the command prompt), which will query (read only) your zotero database and list all storage folders that don't exist. It is then up to you to investigate and fix what happened.

    (I put a package containing the command line client from step 1 and the batch file from step 3 in a single zip archive which you can download from my site)

    [NB: the sqlite command line client that I refer to and the batch code below is obviously only for windows but can be adopted very easy for scripting languages for MacOS and Linux as well; just adjust the text bits in the SQL query to produce to appropriate syntax for your OS]

    Hope this might be useful for somebody,
    krueschan

    --------
    @echo off
    echo @echo off > test2.bat
    sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST ' || items.key || '\NUL ECHO ' || items.key || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> test2.bat
    test2.bat
    del test2.bat
  • ...alright. I now have also done a script that checks for individual files.

    In order to use that, please follow steps 1 and 2 from above instructions, and for step 3 use the code from the bottom of this post here (or download the batch file).

    This should list all the missing attachment files that are supposed to be there according to zotero DB but are missing on your disk.

    There is one (and a half) problem though, but I give up at this point as this only affects my very very marginally: I cannot figure out how to pass non ASCII characters from the zotero DB to the batch file, with the result that all files with unicode file names are reported missing and thus require manual follow up.

    The other half problem is that filenames with special Windows characters (especially the % sign) will cause the program to terminate with an error message. If that happens, search for % in the test3.bat file that is created in your zotero storage folder, remove the % and run test3.bat again (ideally from the command line).

    Any feedback is welcome
    krueschan

    ----------
    @echo off
    echo @echo off > test3.bat
    sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST "^""' || REPLACE(itemAttachments.path, 'storage:', items.key || '/') || '"^"" ECHO ' || REPLACE(itemAttachments.path, 'storage:', items.key || '-') || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> test3.bat
    test3.bat
    del test3.bat
  • Very clever. And it works!

    I find it easier to have text record of all the missing files so I adjusted your batch code as follows:


    @echo off
    echo @echo off > missingfiles.bat
    sqlite3.exe zotero.sqlite "SELECT ('@IF NOT EXIST "^""' || REPLACE(itemAttachments.path, 'storage:', items.key || '/') || '"^"" ECHO ' || REPLACE(itemAttachments.path, 'storage:', items.key || '-') || ' does not exist!') FROM itemAttachments LEFT JOIN items ON itemAttachments.itemID=items.itemID;" >> missingfiles.bat
    missingfiles.bat > missingfiles.txt
    del missingfiles.bat


    Unfortunately, the Unicode character issue is a problem for me — I got loads of German and French sources, and anything outside the ASCII range seems to go awry.
    Oh well, it's a great start!
  • https://github.com/friflaj/zotero-storage-scanner will scan for broken attachments and mark them with tag '#broken'
  • You can add Attachment as a column in the centre panel, then click the heading to sort on it. The missing attachment items (with an "empty" dot) will end up at one end of the sort. You can then tag them as a batch it you want quick access to them later.
  • edited October 3, 2013
    I'm extending the plugin to also scan for stray files in the storage directory.
  • In general that should never happen (stray files that is), but this could be a good troubleshooting tool. Do let us know if you find any stray files.
  • I guess my level of knowledge is just not good enough, but I would love to install the storage scanner add-on mentioned above. But when I follow the link, it may as well be in Chinese. There is no indication of what I have to do to install the plugin. I haven't been able to find any instructions online either. Can anybody offer a quick pointer to how it is done?

    Thanks in advance.
  • languageservicesco: there's a link on the page that emilia linked that says "Install by downloading the latest version." IF you click on that, you'll get an xpi file. Open that in firefox. It'll install a firefox extension.

    After that, I'm at a loss. I'm assuming you have to have the zotero firefox extension installed. But after installing the zotero firefox extension and pointing it to my zotero library, I don't see any "#broken" tags.

    Emilia, can you help?
  • (it's Emiliano)
    Do you have any indication that you have broken attachment links?
  • Hi, Adam.

    Sorry, Emiliano!! Yes, I know I have some missing links. I started using PaperShip on my iPad, but some of the syncing gets hung up on files-not-found. So I inspected my library and found hundreds of missing files. Not sure what happened; guessing I somehow screwed up a Mac->PC migration and hadn't been syncing regularly. Something. I don't want to hunt them down manually.

    Thanks!
  • If you're getting a sync error, you should report that separately.
    To just identify missing attachments, it's probably easiest to follow fbennett's advice from October 3 (i.e. sort by attachments and see the empty/hollow dot
  • Oh, brilliant. Thanks, Adam and fbennett!
  • Here is a python script that scans for missing attachments:

    https://gist.github.com/pkienzle/ea9c15362abe3f63773a

    It handles Unicode in filenames.

    Only tested on Mac. In principle it should work on any unix or windows box with python installed, but I didn't bother with the logic of searching the default locations for the zotero database on these systems.
Sign In or Register to comment.