RIS imports options

internationaled · November 6, 2018

First, I want to start by saying THANK YOU for Zotero. I love this tool and am so thankful for all the work that has gone into it. I can't imagine my workflow without it!

So here's why I'm writing today. My team and I are working on a systematic literature review. Part of that process involves importing RIS files with 1,000 records. The import process is very, very slow -- about an hour per file. Since we will likely end up with about 40,000-50,000 records at the end of this process, I'm looking for some alternatives.

Since I was able to import nine RIS files totaling around 8,300 records into Mendeley 18 in about 45-60 seconds, my plan is to do all the importing of individual files in Mendeley and then, once they're all there, import everything into Zotero from Mendeley in one long import. But that's kind of a hack. Is there some kind of an importing option I can set in Zotero to speed this process up?

Thank you!

internationaled · November 6, 2018

Actually, in looking at the imported references in Mendeley, I see that I am losing content during the import there (e.g., the Archive field--a very import field for doing systematic reviews) and that we really need to stick with a native Zotero import.

If it were possible to select more than one RIS file for import, that would take care of our problem. We could just start the import process and walk away.

Another note: I tried disabling both automatic tagging and syncing. However, that didn't really make a difference in the import speed.

bwiernik · November 6, 2018

1000 records really shouldn’t be that slow. What sort of hardware setup do you have (how much RAM, how old a computer, hard drive or SSD) and what operating system?

Do you have the RIS file or your Zotero data folder in an unusual place like a network drive or an external hard drive?

Can you submit a debug log ID (from the Help menu) showing an attempt to import a RIS file that takes a very long time?

bwiernik · November 6, 2018

Also, can you run Check Database Integrity in the Advanced pane of Zotero preferences?

adamsmith · November 6, 2018

To start: I do think it's a problem that Zotero import is this slow. I haven't tested whether this is normal, but while I know that large import, like bwiernik says, your experience seems unreasonably slow.

But there is an answer to your question:
You can simply paste the RIS files into a single document and then import that. On linux and Mac that's trivial from the terminal:

cat file1.ris file2.ris file3.ris > combined.ris

On Windows I'm sure there's a powershell way to do this, or you can install Cygwin or even the git bash client.

internationaled · November 6, 2018

An update and response:

1. I combined all the files together in Notepad, and they're importing as we speak.
2. I tried importing to a group library and to a local library--no difference. I disabled sync for both. Both libraries are starting with empty Recycle Bins and less than 1,000 records, I believe.
3. These are RIS exports from ProQuest.
4. My computer is MacBook Air 2014, Core i5, 4 GB RAM, 128GB SSD with very snappy response time. Behavior is the same regardless whether I run it under macOS or Windows 10 Pro (dual boot). Since my MacBook runs a lot faster in general under Windows, I decided to do the combined 8,300-record RIS import in Windows instead. But the import speed was about the same under Mac.
5. Latest version of Zotero.
6. Submitted the debug output (D577213053).
7. I disabled automatic tagging. That didn't seem to make much of a difference in speed. Then I hid the tag pane, and that greatly sped things up. Interesting.

Import speed with the above configuration is about two records per second. This is still 16.5 times slower than the speed I was getting under Mendeley, but it's greatly improved over speed it was getting with the tags pane visible. In general, any idea why the two apps have such drastically different speeds for RIS imports, even with Zotero's tagging pane disabled? Could it be the way Zotero handles attachments--i.e., by creating a separate attachment for each note?

adamsmith · November 6, 2018

Speed differences are hard to track down and I'd guess it's multiple things.
The fact that Zotero does do a better job has significant performance costs -- since RIS isn't super well specified (or rather: not everyone follows the specifications), we do a lot of pattern matching and trying out things to get it right. Multiplied by 1000, that takes time. (I'm also guessing the import script itself, having been gradually amended to take account of such special cases, isn't optimized for performance)

Zotero's database structure, programming language, etc. probably also play a role, some of which could probably be improved, some of it not.

dstillman · November 6, 2018

(I'm working on import speed (and write speed in general) at the moment.)

internationaled · November 6, 2018

Thank you to everyone. Much appreciated.