Mendeley import for a large library

mjthoraval · July 13, 2022

Hello,

I am trying to import my Mendeley library to Zotero. I have around 15,000 entries in Mendeley, most of them with attached pdf(s), for a total library size around 30 GB. I am still mostly using Mendeley Desktop, so all my files are stored locally. All my files are also synced to the cloud with Mendeley sync.

I have tried on a Windows 10 laptop connected by ethernet cable. I have left it overnight, but in the morning, Zotero was closed, and I got only 9,000 entries imported. Can I restart the sync with Mendeley to complete the sync, or do I need to restart from scratch?

Thank you.
MJ

dstillman · July 13, 2022

Yes, you can just run the importer again — it won't duplicate anything it's already downloaded.

You may want to try with the Zotero beta — we made some improvements to the importer the other day that may help with large libraries. You can switch back to the release version immediately after importing if you don't want to stay on the beta.

mjthoraval · July 13, 2022

Thank you for your reply.
I have installed the beta. This time, Zotero stopped at 7221 items.
I have restarted the import process and it is now running again.
Is this crash related to some maximum size of import set by Zotero or Mendeley?

If I start working on Zotero or Mendeley before the import finishes, will this affect the import process?
Related question: Can I still use the import from Mendeley after starting working on Zotero? For example, I finish the import in 3 days. Then I make changes in Zotero and in Mendeley (add new items, modify the metadata of existing files, annotate files, ...). What will happen if I import again after these changes? How will the conflicts be handled?

Thank you,
MJ

tnajdek · July 13, 2022

Was there an error displayed when the import stopped or did Zotero crash entirely?
One thing worth checking is that you have enough disk space for the import.

If you still have problems, next time could you please run the import with debug logging and paste a debug ID here?

tnajdek · July 13, 2022

if you re-import, Zotero will update existing items to match imported data i.e. changes done in Zotero will be lost, changes done in Mendeley will be imported. This only applies to items that have been previously imported, other items are not affected.

mjthoraval · July 14, 2022

Zotero has crashed again during the second part of the import. This time, I have 14591 items in my Zotero library after the crash, still missing a few hundreds only.

I was able to confirm that the import was able to restart smoothly after closing the computer to sleep from the office and reopening from home in the evening. The sync continued smoothly at least until 3 am, but then it was gone again when I woke up. No error, no warning, nothing left at all of Zotero. Just gone. Same as the previous time.

I still have 120 Go of free space on my 941 Go hard drive. So there is plenty of space for the import.

I am now running the import with debug logging. If it (hopefully) finishes the import this time, I can try to import again from another Zotero account in a spare computer at the office. I guess the paying sync with Zotero option should not change anything from that test account?
I will come back here after that.

Do I have any way to verify if it has created duplicates in my library?

I think it would be nice to get some kind of information in the tutorial or in the import window about what is happening during the import process. I found it a bit worrying at first to see the sync status being "importing...", but not seeing anything happen. It took maybe 2 hours or more to see the first pdf files entering my library. Then there may be some other steps, but I would find it useful to get some information about the import process during that waiting time. What will happens if the import stop and I restart it (you gave me the answer above)? Will it go through the same initial "stuck" phase all over again, or should it be faster?

Thank you again for your help on this.

dstillman · July 14, 2022

If it (hopefully) finishes the import this time, I can try to import again from another Zotero account in a spare computer at the office.

What do you mean by this? If it finishes, why would you import again on another computer?

I'd guess it disappearing is Zotero is running out of memory — on Windows, Zotero is currently much more limited in how much memory it can use. It may help to restart your computer before the import attempt and make sure nothing else is running.

mjthoraval · July 14, 2022

I have just got another crash. Zotero disappeared with a RAM usage around 2 Go. The total RAM on my laptop is 32 Go.

I have restarted Zotero, and trying to send you the debug log.
I guess I need to follow these instructions after the crash:
"If submitting output fails, you can return to the Debug Output Logging menu and select View Output, go to File → “Save…”, choose Format: “Text Files”, and save the output to a file, which you can email to support@zotero.org with a link to your forum thread. It can be helpful to ZIP the file before emailing it."

But it seems that the View Output is just the live output of Zotero, not the log of the crash. I don't see any way to get a Debug ID after the crash.

mjthoraval · July 14, 2022

A few remarks:
- To reply to your question, I could try again from another account just for testing if it is useful for you to get the debug log.
- Zotero was typically using around 1 Go of RAM during the import while not using Zotero. I did not see the memory use going higher while Zotero being idle.
- When opening pdf files in Zotero, Zotero was very slow. The Zotero RAM usage went up to 2.3 Go, but Zotero did not crash at that RAM usage peak.
- The previous disappearing happened when I was away from the computer. It was stable for a whole day on the side of my work without any problem during heavy RAM and CPU uses. I was not monitoring at the time of disappearing, so I don't know if something could have affected the memory use. It seems that something different happened at that time in the Zotero import process.
- Another thing that I can think of: the new Mendeley login system disconnects me every day, from the browser and the Reference Manager. I don't know if such login time limit from Mendeley could play a role.
- I also have a 252 Go RAM workstation running Ubuntu 20.04.4 LTS. Would it be faster / more stable to do the import from the Ubuntu side?
- Can I sync the current state of the import to Ubuntu with the Zotero paying sync, and restart the import from Ubuntu?
- I saw the available space on my computer go down drastically. So I went into the Zotero storage folder "C:\Users\\Zotero". There is a "tmp" folder containing 29 Go of data. The "storage" folder has 32 Go of data, larger than the estimated Mendeley library size of 31.3 Go. Another computer with the synced Mendeley library only has 30.5 Go. It is surprising that the "storage" folder is already larger than my whole Mendeley library. My Zotero was completely empty before starting the import. It is also very strange to see the size of a "tmp" folder grow so large.

mjthoraval · July 14, 2022

-

tnajdek · July 14, 2022

- I also have a 252 Go RAM workstation running Ubuntu 20.04.4 LTS. Would it be faster / more stable to do the import from the Ubuntu side?

I'd give it a go and import on this machine using beta build for Linux 64-bit (the last part is important, I assume this machine is running 64-bit Linux) . Also on Linux it's easy to redirect logging output to a file so if it crashes during import, you can just zip this file and send it to support@zotero.org with a link to this thread.

- I saw the available space on my computer go down drastically...

Import process downloads everything first so "tmp" folder is expected to grow to your library size on every import attempt.

- Can I sync the current state of the import to Ubuntu with the Zotero paying sync, and restart the import from Ubuntu?

Yes, once data is on Zotero end, it will sync between all machines, just make sure you configure sync on both devices.

mjthoraval · July 14, 2022

I have installed the following beta version from the instructions here:
zotero-beta_6.0.10.7+acba90f27_amd64.deb
https://github.com/retorquere/zotero-deb
It seems to be the same version number as the one provided in the link you gave, but amd64 instead of the one in your link:
Zotero-6.0.10-beta.7+acba90f27_linux-x86_64.tar.bz2
installed with:
sudo apt install zotero-beta
The other installation I had done from the Zotero page did not activate the Zotero links in Ubuntu (zotero://...). The retorquere github page installation worked. I am not sure what is the difference.

lscpu returns the following on my ubuntu workstation:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper PRO 3995WX 64-Cores

The sync has uploaded the library from the Windows laptop side. I will wait now for the sync download on the Ubuntu workstation before trying again the import.

I have now have Zotero and Zotero-beta installed together on Ubuntu. Can I keep them installed in parallel?

mjthoraval · July 15, 2022

The import from Zotero-beta in Ubuntu worked perfectly this time, after syncing from the partial import in Windows 10. I still have the log file, but I guess you do not need it, as it was successful.
The "tmp" folder in Windows 10 has been emptied automatically now.
Thank you very much for the help.

mjthoraval · July 30, 2022

A small update on this thread.
I have noticed that importing from Mendeley created duplicates probably due to a conflict with ZotFile: PDF files duplicates produced when importing from Mendeley.

I have finally decided that it was worth importing again from Mendeley without ZotFile to make sure my library is clear of these duplicates.

1) I have deleted my library by selecting all the items and pressing the delete key. This has crashed my windows 10 laptop (32 Gb RAM), so I had to go to my Ubuntu workstation (252 Gb RAM). It worked, but requested a long time and huge amount of RAM for my ~32Gb library (I stopped tracking around 15Gb RAM usage).

2) Importing again from scratch my Mendeley library, it again crashed after about 1 day. The computer was completely idle. Again, the RAM usage went up more than 15Gb. The second attempt was successful without crashing.

3) I realized that the shared group libraries were not imported, consistently with the description in the documentation. But the folder "My Publications" was also not imported to Zotero.
It would be nice if that folder could also be imported in Zotero. Otherwise, adding a warning in the documentation could help the user anticipate and possibly use a tagging system to recover his "My Publications" folder easily.

dstillman · July 31, 2022

Otherwise, adding a warning in the documentation could help the user anticipate and possibly use a tagging system to recover his "My Publications" folder easily.

Type your name into the search bar and drag them to My Publications? Files in My Publications in Zotero are shared publicly, so adding items requires a special process where you choose whether to share them and attest that you have the right to do so, but this seems like something that would take about 30 seconds to redo after the import if you actually want to share your publications on your Zotero profile page.

mjthoraval · July 31, 2022

I understand your point that there is a reason behind this limitation, due to the features offered by the "My Publications" folder.
But I would like to emphasize that building a publications list may not be a 30 seconds job in many cases:
1) A publications list including all conferences reports can easily extend to tens or hundreds of entries. Searching them again manually can take a lot of time.
2) Searching by name is clearly not working for many authors, for the many reasons that lead to the development of ORCID. Except if Zotero provides any kind of author identification linked to ORCID, building a publications list will always remain very time consuming for authors with a significant publications list.

Considering these limitations, giving the user the ability to easily transfer his publications list from Mendeley to Zotero seems to be a fairly easy improvement to implement in Zotero, with clear added value for the users:
1) Either directly by Zotero, through adding an automatic tag "MyPublications" on the entries which were in the folder "My Publications". The final step of transferring to the "My Publications" folder in Zotero would then be trivial, user controlled, and comply with the special process in Zotero.
2) Or at least providing a clear warning in the documentation, with suggested steps to overcome the limitations. I do not see any argument not to do this?