How to find unlinked files
Hi,
Here is a solution for those of us who use linked files feature instead of attached files features and want to know if the folder containing the files (pdf mostly) are linked or not to the Zotero database.
1- Export the database to the bibtex format.
2- Open the .bib file into Jabref.
3- Click "Quality" menu and then "Find unlinked files..."
4- Select the folder where your files are
5- Then click "Scan Directory"
This is possible since the exportation save the pathfiles of your Zotero items.
Here is a solution for those of us who use linked files feature instead of attached files features and want to know if the folder containing the files (pdf mostly) are linked or not to the Zotero database.
1- Export the database to the bibtex format.
2- Open the .bib file into Jabref.
3- Click "Quality" menu and then "Find unlinked files..."
4- Select the folder where your files are
5- Then click "Scan Directory"
This is possible since the exportation save the pathfiles of your Zotero items.
Normal people would say this is solid and there's no potential damage to your library, and certainly no known problems that would cause it. I see no way it could, it never has, and uses only zotero apis to do its work. It doesn't touch the attachments in any other way than to add or remove these tags and then save the item into the DB, all using the same api zotero would to do this. If I however would make the claim that there's no *potential* of damage my autism (or the training in analytic epistemology, it's hard to distinguish the two) would assert itself and would object and say that I can't with absolute certainty state that all potential for damage is excluded.
But yeah, normal people would say zero risk.
I wonder if it actually opens the file or verifies the file type is correct? Probably not, and that's almost an overkill in any case.
I'm not clear what the #duplicates does (or how it does the task).
I told you this doesn't do much :) it just automates what Zotero can already do.
#duplicates just tags attachments where you have two or more of the same type (so two PDFs, two word documents, whatnot) under the same reference -- it's just the problem I needed fixing when I wrote the plugin.
Regarding the "#duplicates just tags attachments where you have two or more of the same type" - so, it only looks at file TYPE and not file name or byte size? Thus, if a top-level item has two PDFs (different file names and sizes), then they will be tagged with #duplicates?
It should be fairly fast because it does so little. No idea on execution time. Should be highly dependent on your system, but highly io-bound, so a slow disk would be a bigger bottleneck than a slow cpu. I think.
The duplicates feature worked as described, though it creates a lot of false positives given the decision rule. I have many top-level items with multiple attachments of the same file type - for example, a journal article might have two PDFs one of which is the main manuscript while the other is the supporting information. Another example is when I have two webpages attached to the same top level item.
Once I've fixed the broken file paths I will run the latest version of the tool.
Regards.