How to find unlinked files

Hi,

Here is a solution for those of us who use linked files feature instead of attached files features and want to know if the folder containing the files (pdf mostly) are linked or not to the Zotero database.

1- Export the database to the bibtex format.
2- Open the .bib file into Jabref.
3- Click "Quality" menu and then "Find unlinked files..."
4- Select the folder where your files are
5- Then click "Scan Directory"

This is possible since the exportation save the pathfiles of your Zotero items.
  • https://github.com/retorquere/zotero-storage-scanner is now compatible with 5.0 and will tag the attachments with "#broken" in zotero.
  • @emilianoeheyns Is this solid or any known issues that could potentially damage a library? Also, the description says the tool will" live updates two smart-folders #duplicates and #broken" and I'm wondering if that means it will create a tag and mark positives with those tags (like you say)?
  • Uh, that must be contributed text, it doesn't look like how I'd phrase it myself. It will just (un)tag attachments, nothing else.

    Normal people would say this is solid and there's no potential damage to your library, and certainly no known problems that would cause it. I see no way it could, it never has, and uses only zotero apis to do its work. It doesn't touch the attachments in any other way than to add or remove these tags and then save the item into the DB, all using the same api zotero would to do this. If I however would make the claim that there's no *potential* of damage my autism (or the training in analytic epistemology, it's hard to distinguish the two) would assert itself and would object and say that I can't with absolute certainty state that all potential for damage is excluded.

    But yeah, normal people would say zero risk.
  • Thanks, that's assurance enough. Almost all my attachments are "link to file" type, though there are a few webpage snapshots. So, this tool will follow the path defined by each link and if a file does not exist at that path, it will tag my attachment (or link to attachment) with #broken, right?

    I wonder if it actually opens the file or verifies the file type is correct? Probably not, and that's almost an overkill in any case.

    I'm not clear what the #duplicates does (or how it does the task).
  • AAMOF it doesn't even do that, it asks Zotero whether the path exists (which works for both stored files, linked files and snapshots) and then tags (or untags, if the problem has been fixed) the attachment as appropriate. It does not open the file in any way; the request to Zotero to resolve the path just tells me whether it exists.

    I told you this doesn't do much :) it just automates what Zotero can already do.

    #duplicates just tags attachments where you have two or more of the same type (so two PDFs, two word documents, whatnot) under the same reference -- it's just the problem I needed fixing when I wrote the plugin.
  • Still, very useful :)

    Regarding the "#duplicates just tags attachments where you have two or more of the same type" - so, it only looks at file TYPE and not file name or byte size? Thus, if a top-level item has two PDFs (different file names and sizes), then they will be tagged with #duplicates?
  • One more thing, how fast is this? Ballpark execution time for a library with 3,000 top-level items each with one linked attachment?
  • It doesn't look at file size or file names, just file types. I'm open to suggestions (but short on time, so no promises).

    It should be fairly fast because it does so little. No idea on execution time. Should be highly dependent on your system, but highly io-bound, so a slow disk would be a bigger bottleneck than a slow cpu. I think.
Sign In or Register to comment.