Fix broken pdf links where underscores are taken as blanks

Hello,

At some point during an upgrade of zotero and a migration to a new computer, I ended up with a lot of broken pdf links. Specifically, my new zotero installation believes that the files should be stored under a name like this:

.../zotero/storage/JJVPWVTT/Slonczewski - 1968 - Induced anisotropy in Ni-Fe films.pdf

However, in reality, the file is stored like this:

.../zotero/storage/JJVPWVTT/Slonczewski_-_1968_-_Induced_anisotropy_in_Ni-Fe_films.pdf

Essentially, the stored filename has underscores while the database entry has blanks.

I have been using zotero for quite a while since this happend and manually re-created the links to those papers that I use most often. However, there are too many to fix it all manually. Now that I have both, files with blanks and files underscores, is there any way to batch-fix these broken links? Ideally with a batch process that checks for every pdf if in my database if the link is valid and if not, replaces blanks by underscores in the database entry.

Thanks!
  • That's not something Zotero would've done, and there's no reason to change the database entries. Something external to Zotero presumably replaced the blanks with underscores. You can use an external file renaming tool to batch replace underscores with spaces, and then manually correct any that should actually have underscores.
  • ZotFile has an option to replace spaces with another character. Maybe that had a bug somewhere?
  • That's what I assume happened here. Just renaming the files on my disk without the information of what's the zotero database entry will not help at all since there are now many files where for each of these cases:

    * the filename has underscores and the database entry does too
    * the filename has underscores and the database entry has blanks
    * the filename has blanks and so does the database entry

    I'm happy to batch-rename with whichever software but it needs to read the zotero database to fix these links.

    Thanks.
  • It would certainly be possible to write a script, but I don't know that you'll find anyone to help you with that. That's why I suggest just renaming all the files in one direction and fixing missing files as you use Zotero going forward. It's as simple as double-clicking, clicking Locate, and double-clicking the only file that exists in the directory that pops up.

    Even if you were going to use a script, the easiest script to write would simply check all attachments and, for any that were missing, check for another PDF in the directory and update the filename in the database to match. Something that went back and forth comparing database entries to files would be much more complicated. So batch-renaming the files to the desired form would be the first step regardless.
  • This is what I have been doing so far. The problem, however, is that I use zotero under linux and windows and only correctly linked files get synced. Very often I find myself trying to access a file under windows, only to realize that the file is missing because of the broken link under linux. Then I need to reboot, fix the link, reboot again... This is why I'd like to fix all links, but with several thousand entries there is no way to do it manually.

    Is there a way to read the zotero database (or at least the file links) with python?
  • It's an SQLite database — you can read it with anything. You absolutely shouldn't modify the database directly, though. If you're going to read the database directly, you'd want to just rename the files based on the DB, not the other way around.

    The alternative would be to do what I say above: 1) batch rename the files using an external tool and 2) use the JavaScript API in the Run JavaScript window to update missing stored-file attachments based on the single file in the directory. Writing that script without knowledge of Zotero internals would be tough, though.
  • Yes, I definitely would use the database read-only. Would you know the query to extract all linked filenames?
  • You could try to use the Zutilo add-on for this, see here and here. With "replace all instances" checked, the "Modify attachment paths" function should allow, e.g., replacing all spaces with underscores in Zotero's database. Some people have experienced issues with the function, so be cautious. But generally it seems to be working well. Remember to make a backup before using it.
  • Zutilo only modifies linked-file paths, so it won't help here.
  • Sorry for the oversight. Maybe replacing Zotero.Attachments.LINK_MODE_LINKED_FILEwith Zotero.Attachments.LINK_MODE_IMPORTED_FILE in the relevant Zutilo code might work.
  • You could also try using Zutilo's "Copy attachment paths" function for copying the file paths in your Zotero database. It works on a selection of parent items in the middle pane of the user interface.
  • This "Copy attachment paths" worked very nicely! I only had to do it in chunks of a few hundred items, it's probably not made for being applied to the whole library at once arbitrary. Anyways, I can now open every file under linux with the missing link warning. But the files are not synced. Is there a way to force syncing of the pdfs? (I clicked the green circular arrow, which then spun for a while but did not sync the newly linked pdfs).
  • Preferences --> Sync --> Reset --> Reset File Sync history
    should work (absolutely do _not_ use the "Replace online library" option, though)
  • This worked! Thank you all very much!
Sign In or Register to comment.