Links broken following migration from Mendeley and attempt to clean up duplicates
Hi,
I just switched from Mendeley and I'm still struggling with the concepts of the base folder, linked attachments folder and shared folders. Zotero is a wonderful initiative and product and I'm planning to stick with it. I think though that I have royally screwed up my migration and have lost most if not all of my attachment links. I'm hoping there's a simple solution.
Right now my data directory location is %SYSPROFILE%\Zotero and my base directory is "D:\Attention\Lit\_Zotero 2020\storage". Many files now have a link to my research assistant's folder "/Users/emily/Dropbox/ASDIT common/Lit/" on her Mac. This is because I had thousands of duplicate refs when I imported my records from Mendeley and I asked her to help me merge them.
I did this as follows:
1. I bought extra storage on Zotero so that I would have room for citation info + associated pdfs.
2. I synced my desktop library to my online Zotero account.
3. I asked her to install Zotero and login with my credentials so that she could merge duplicates.
4. She did this and happily duplicates are gone. However, when I look at refs on my computer, I see the link to pdfs to her folder "/Users/emily/Dropbox/ASDIT common/Lit/" not to mine.
QUESTIONS
1. If she set her linked attachments folder to the base of the Zotero storage folder and I set mine to the same, e.g., C:\Users\John\Zotero, would that solve the problem.
2. On my computer, should I set my base directory to be the same as the data directory? Right now I have two Zotero storage folders and I'm sure that leads to no end of trouble.
Thanks for your help with this.
I just switched from Mendeley and I'm still struggling with the concepts of the base folder, linked attachments folder and shared folders. Zotero is a wonderful initiative and product and I'm planning to stick with it. I think though that I have royally screwed up my migration and have lost most if not all of my attachment links. I'm hoping there's a simple solution.
Right now my data directory location is %SYSPROFILE%\Zotero and my base directory is "D:\Attention\Lit\_Zotero 2020\storage". Many files now have a link to my research assistant's folder "/Users/emily/Dropbox/ASDIT common/Lit/" on her Mac. This is because I had thousands of duplicate refs when I imported my records from Mendeley and I asked her to help me merge them.
I did this as follows:
1. I bought extra storage on Zotero so that I would have room for citation info + associated pdfs.
2. I synced my desktop library to my online Zotero account.
3. I asked her to install Zotero and login with my credentials so that she could merge duplicates.
4. She did this and happily duplicates are gone. However, when I look at refs on my computer, I see the link to pdfs to her folder "/Users/emily/Dropbox/ASDIT common/Lit/" not to mine.
QUESTIONS
1. If she set her linked attachments folder to the base of the Zotero storage folder and I set mine to the same, e.g., C:\Users\John\Zotero, would that solve the problem.
2. On my computer, should I set my base directory to be the same as the data directory? Right now I have two Zotero storage folders and I'm sure that leads to no end of trouble.
Thanks for your help with this.
I'm not really sure what D:\Attention\Lit\_Zotero 2020\storage is in your case, or what you have in the folder above it. 'storage' is the hardcoded name of a folder within the data directory. There's no reason for anything else to be called 'storage'. I'm not sure if you previously pointed the data directory at "_Zotero 2020", such that you have a zotero.sqlite file there as well and the random 8-character folder names that go in the real 'storage' folder in "_Zotero 2020\storage". If so, that would be quite a mess.
The Linked Attachment Base Directory causes linked files under it to be stored with relative paths, so she needs to set it to the folder she's going to share with you that contains all the linked files. You'll then set it to the location of the same folder on your computer, and the files will be accessible. So if she can access a linked file at /Users/emily/Dropbox/ASDIT common/Lit/foo.pdf, and you can access the same file at D:\Dropbox\Lit\foo.pdf, you'll both be able to access the file as long as the base directory is set properly on both of your computers.
I think when I first set up Zotero, I intended to sync attachments with Dropbox so I pointed the storage folder to one that DB would sync. That folder on the D: drive is only 300 Mb vs. 1,300 at the default userprofile location. That's probably why I have two 'storage' folders. Fortunately, I only have one zotero.sqlite db and it's in the proper location @ %userprofile%\zotero. It was also updated today and it passed the db
So how can I figure out if the storage folder on drive D:, to which the linked attachment base directory (LABD) is pointing, contains valid data or not? Is it just a bunch of references that aren't linked to anything or are they references from other locations on my hard drive that I somehow linked to Zotero rather than copying to the storage folder?
Is is safe to delete it? If it helps in my Zotero recovery, I'm willing to sacrifice that folder so I have a single Zotero storage folder even if I lose some refs.
You'd have to say what kinds of folders and files you have in '_Zotero 2020' and 'storage' on D: for us to tell you what to do. If there are just regular PDFs within there, they may be linked to attachments in your database. If all the PDFs that matter are on your assistant's computer and will be synced to your computer either via Zotero or via a cloud storage folder, then you can probably delete it.
re: 'Zotero 2020\Storage'
They are regular PDFs that exist elsewhere on my PC. Therefore, my inclination is to delete them and start from scratch.
I'm more worried about salvaging my main library in %userprofile%\Zotero\storage with thousands of refs. It looks like when the duplicates were merged, the Zotero metadata (if that's the right word) got merged but also all of the attachments, even though identical, were attached to the merged record. See this link for example that shows one Zotero ref, 4 valid attachments and 3 that are greyed out. Output from 'Duplicate cleaner Pro' (part of the screenshot) shows that these 7 pdfs exist in different sub-folders of storage:
https://bit.ly/2S8J5NG
Going through thousands of files and manually deleting duplicates is not worth my time. I see two ways forward and would appreciate your advice:
1. Use a program like duplicate cleaner to delete all of the duplicates. This is easy but what will happen in Zotero? I guess there's a risk that some links will be lost if the deleted files were the only ones linked to a particular Zotero record. Is that right?
2. Start from scratch and re-import all of my pdfs to Zotero. This would be easy since the majority of refs I care about are on my hard drive organized into hierarchical folders. I guess it would mean I would lose all of my tags though.
Thanks for your advice.
If you don't have any groups where the same files exist, then yes, you could use a duplicate cleaner to delete duplicate files. You'd still then want to delete all the attachments in Zotero with missing files (indicated by an empty blue circle), but we could give you a short script to run that would delete all attachments without files.
(While doing it by hand would still be annoying, note that you can click an item and press + to expand all items, which would make it a bit quicker to quickly select ranges of attachments with Ctrl and Shift.)
Thanks!
Here's a script you can run from Tools → Developer → Run JavaScript to add the tag "_missing file" to any attachments in My Library with a missing file. You can then click in the items list and do a Select All (Ctrl-A/Cmd-A), delete them, and empty the trash.
var s = new Zotero.Search();
s.libraryID = Zotero.Libraries.userLibraryID;
s.addCondition('itemType', 'is', 'attachment');
var ids = await s.search();
await Zotero.DB.executeTransaction(async function () {
for (let id of ids) {
let item = Zotero.Items.get(id);
if (item.isFileAttachment() && !item.isLinkedFileAttachment()) {
if (!await item.fileExists()) {
item.addTag('_missing file');
await item.save({
skipDateModifiedUpdate: true
});
}
}
}
});
I ran is with "Run as async fn" ticked. I received a return value of ===>undefined<===
Here's a screenshot of the results when I filter on _missing file: https://www.dropbox.com/s/8zdezjwyxwiyzip/2020-04-28 213500 zotero _missing file script results.png?dl=0
There are only 4 attachments with the _missing file tag and all four are indented under an appropriate Zotero file. Doesn't this mean that they are attached?
Is the script okay?
[JavaScript Warning: "unreachable code after return statement" {file: "resource://zotero/loader.jsm -> resource://zotero/bluebird/util.js" line: 201 column: 4 source: " eval(obj);
"}]
[JavaScript Error: "addon.getResourceURI is not a function" {file: "chrome://zotero/content/xpcom/prefs.js" line: 390}]
[JavaScript Error: "Error connecting to server. Check your Internet connection."]
(not sure what the last one's about; I'm able to sync)
Remember, the point of this script is to let you quickly delete useless attachment items after you use an external duplicate cleaner in the 'storage' directory to delete duplicate PDFs. Have you done that yet?
1. delete all entries that say _missing file (open blue circle). This implies that all the pdfs that remain in Zotero storage are connected to a parent item. (I've done this)
2. Use the external software to delete all duplicates. This would be safe since the duplicates that remain are attached to a parent item. Is that correct? I will go ahead and delete the duplicates if you agree that this is the case.
The script doesn't do anything to find PDFs in 'storage' that aren't linked to items, but there's no reason to think those exist. This directory is managed by Zotero, so unless you force-quit Zotero in the middle of an import or copied in other files outside of Zotero, there's no reason there'd be orphaned files in there.
You should obviously make a backup of the entire directory before proceeding. If you notice that all attachments under an item are tagged with "_missing file", you can untag one of them and relocate the file manually by searching for it within the backup folder.
I just ran the dups sofware and got rid of exact duplicates (shared same MD5 tag). I then ran your script. I now have a helpful view where I can see all Zotero items and associated files. I see when I right-click on one with the open blue circle and select "show file" that it doesn't exist in a storage folder. However, if I left-double-click Zotero is able to open the pdf and the icon goes from open to solid blue. It then creates a new storage sub-folder and puts the pdf there. Is it taking a copy from the pdf that is also linked to the same Zotero item? Does it do that so I can select the _missing file attachment I wish (i notice that some of my missing files have full file names whereas the ones with the solid blue circle are just called "PDF".
I'm using this feature to make sure that Zotero keeps the most useful filename associated with its item and removing the ones with the _missing file tag.
The script really is for the very specific purpose I explained: selecting the attachment items associated with the files you deleted so that you can delete those attachments. That's just the attachment title. The underlying files should all already be named based on the parent metadata. There's no reason to duplicate the filename in the attachment title, which is why files saved from translators don't do that. I'd really encourage you just to delete the selected attachments and not worry about this further.