Firefox freezes after downloading multiple citation from Pubmed

kt297 · January 18, 2011

Hi, have recently started using Zotero and I am very impressed so far.
I have noticed that when selecting multiple boxes when downloading citations from pubmed it can take a long time and can crash Firefox! Its ok for upto 14 citations at the same time (takes 21 seconds). I would like the ability to download upto 200-300 citation from pubmed in one go more quickly and without crashes. I've downloaded the stable Zotero version (2.0.9).

dstillman · January 18, 2011

By crash do you mean "freeze" or an actual "crash"? If the latter, go to about:crashes, generate a report for a relevant crash, and provide a link to the Mozilla crash report.

But Zotero is designed to automate what you might do by hand—it isn't really intended to be a bot downloading 200-300 references at a time. Doing so could easily freeze the UI, at least for a long time. Also, bear in mind that Zotero does more than screen scraping of what you've already loaded—for many sites it needs to make multiple web requests for each item, and it does no throttling or delaying, so with that many items you're essentially carrying out a denial of service attack against the site, and a site would be well justified in banning your IP address temporarily after such an access pattern. (Google Scholar will do just that, and other sites could do the same.)

kt297 · January 18, 2011

Hi Dan,
I think i mean firefox freezes (says browser not responding and greys out). I close the browser then start up firefox again and zotero seems to have saved some citations but not all.
Comparing zotero to Endnote X which I also have (I am thinking to move over to zotero completely as I like its philosophy and its software) Endnote can connect to the Pubmed website and download 100 references in about 10 seconds without freezes.
Sorry if I sound like a nuby but I am just an enduser wanting to use a free, efficient referencing tool. The reason I would like to download large numbers of citations is that I need to use them when I'm offline. Its useful when writing a paper on a particular topic to download a particular search name in pubmed (www.ncbi.nlm.nih.gov/pubmed) and all its citations so that I can use them offline (and search them) when I donot have internet access.
I suspect that Endote and zotero are quite different in the backend and do some things differently. As a non technical enduser(Orthopaedic Surgeon) all I want is quick download of large number of citations for use later offline for writing and referencing of papers in a wordprocessor.

ajlyon · January 18, 2011

Try disabling PDF indexing in the Zotero preferences.

dstillman · January 18, 2011

What ajlyon is suggesting is to set the max indexed pages to 0 in the Search pane of the prefs. That might help.

kt297 · January 18, 2011

I've tried disabling PDF indexing and changing max indexed pages to 0 but no change.
Any other suggestions?

ajlyon · January 19, 2011

Yes-- that wouldn't do anything, since the Pubmed translator doesn't even save PDFs. I just tried on my computer and I can't replicate the issue.

The translator only makes one request, since it concatenates the PMIDs and requests a file with all the corresponding records, which is pretty much what NCBI recommends (http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchlit_help.html#RecordIdentifier).

kt297 · January 19, 2011

Ajlyon - what operating system are using. I use windows 7 with the latest firefox stable? Should I remove zotero and reload plugin back into firefox? I've even turned off my antivirus and firewall with no change? How long does it take you to download 100 citation from pubmed? Any other suggestions?

adamsmith · January 19, 2011

100 citations take about 5mins for me on a relatively slow computer (but under linux - I think translator performance tends to be worst in Windows) - Firefox freezes during that time, but unfreezes if you're patient and wait until it's done.

Medium sized imports tends to be pretty slow in Zotero, no? Any particular reason why? This is just data, no attachments of any kind.

kt297 · January 19, 2011

It seem that endnote has an alternative way in which it can download citations very quickly (100 in 10 seconds ). I'll perhaps use endnote to download citations and import them into zotero?

ajlyon · January 19, 2011

I think that we could improve the performance of the translator by having it save each citation in parallel-- right now the translator saves each item completely (meaning creating indices and whatnot) before moving on to the next item. I'll look into this some more. I'm not sure what options are available from within the translator sandbox, but something should be possible.

kt297 · January 19, 2011

ajlyon - Thanks. It would be nice to have a box to select in the preferences for the option of just getting the citations "as is" with no other links to speed things up? I look forward to a speedy translator for the pubmed site.

adamsmith · January 19, 2011

but for Pubmed you aren't getting any other link, are you?

kt297 · January 19, 2011

Adamsmith - yes, but occasionally on the selected citation there is a link to a full text article which are normally accessible via subscription or password activation ( eg sciencedirect etc). This is more common with recent (from year 2000 onwards) citations. But as far as the citations which appear on a page from a pubmed search they are only texts based in one database.

adamsmith · January 19, 2011

yeah - what I meant was that Zotero is not importing anything but the citation data - no links, no snapshots etc.

kt297 · January 19, 2011

I think so? Not sure how zotero translator works technically as compared to Endnote pubmed citations downloader?

dstillman · January 19, 2011

I have no idea if it's still the case, but I believe EndNote used to use Z39.50 to access PubMed, which could easily be faster (if less full-featured than the current website), since it's designed for data retrieval.

Saving 50 items via Zotero takes 40 seconds for me. That would be a reasonable speed for a translator that made multiple requests (in which case, again, it would also be somewhat abusive to the site), but PubMed should be faster since there's only a single request. Judging by the debug output, we could speed this up by performing the import in a single transaction—right now it's using multiple transactions, which greatly increases disk access. A single transaction would have the side effect of making the save all-or-nothing, so an error with one would result in a failure of all. I believe regular imports use a single transaction already.

ajlyon · January 19, 2011

Is there a way for translators to coerce Zotero into saving all the items as a single transaction? I don't now what in the sandbox would be of use, apart from Item.complete(..).

It would also be pretty easy (very easy!) to make the Pubmed translator into an import translator as well, so it could just call itself to import the result XML from the website, if that would give us the mentioned performance advantage. That would also let us one tick off one more cell in Wikipedia's table comparing reference management software.

dstillman · January 19, 2011

Is there a way for translators to coerce Zotero into saving all the items as a single transaction?

No. It would have to be changed in the translator architecture.

adamsmith · January 19, 2011

It would also be pretty easy (very easy!) to make the Pubmed translator into an import translator as well, so it could just call itself to import the result XML from the website

I think that would be very nice to have - I'm thinking less of the WP boasting here (though that's nice, of course) - than of a scenario where Zotero isn't installed on a lab computer and someone wants to grab a bunch of references. Unfortunately Pubmed doesn't offer any other useful export format and given its prominence and the seeming popularity of Zotero among medical researchers this would seem like a neat feature to have.

kt297 · January 19, 2011

Is there a way I can give you any more information about my problems. Can I produce some sort file or debug output as you put it? I will need a step by step instructions.

adamsmith · January 19, 2011

no, you're good - people see more or less what you're seeing and trying to figure out how to speed things up. Might not get quite as fast as Endnote, but faster seems possible - but nothing that will happen anytime soon.

ajlyon · January 19, 2011

No. It would have to be changed in the translator architecture.

So implementing an import translator won't let us get the same behavior?

dstillman · January 19, 2011

I'm not sure about that, but possibly not. An actual import starts the single transaction outside the translator architecture.

If we decide to change this for translator-based saving, though, it'd likely be a trivial change.

DWL-SDCA · January 19, 2011

I recommend looking further into ajlyon's XML idea.

Several times a week, using the PubMed interface (not-Zotero) I select several hundred article records and download them as an xml file. This takes about 2-3 seconds. I then import the xml into my own mySQL database. Even though my translator does extra processing (converting all caps titles and authors to lower case and marking them for extra attention, stripping the beginning and ending straight brackets for titles translated from non-English and dealing with the end of sentence punctuation that may arise when there irregularities, checking for duplicates in the database, linking the comment in / comment on records, and a few other things) a 200 item import to MySQL requires less than 30 seconds.

I doubt if my own PubMed xml translator would be useful but I'm willing to share it if any of you developers think that it might be helpful.

dstillman · January 19, 2011

DWL-SDCA: As explained above, adding a PubMed XML translator is unrelated to the performance issue.

We don't really need further posts on this until someone gets around to testing translator-based saving within a single transaction.

Discussion of a possible PubMed XML import translator can take place elsewhere.

dstillman · January 19, 2011

Ticket created.

(Simon reminded me that there's a technical reason for this—compatibility with faulty translators—but we're going to try to come up with a workaround.)

ajlyon · January 24, 2011

I just finished up work on a version of the Pubmed translator that just saved 200 entries in about a minute on my puny netbook. This version implements the import interface, so now you can import piles of Pubmed's XML output, wherever you may find it. For example, I tested the 200-record import by selecting for Pubmed to show 200 results in XML format, copying, and importing from the clipboard. Not sure if this is considered respectable performance, but it at least might make saving in one transaction possible in the future.

The version for 2.1 betas and standalone: https://github.com/ajlyon/zotero-bits/raw/master/NCBI PubMed.js
For 2.0.9: https://github.com/ajlyon/zotero-bits/raw/master/NCBI PubMed.js.2.0.9 (for 2.0.9, you'll need to delete the .2.0.9 suffix)

These can be saved in the translators directory of your Zotero data directory.

Please test, so we can get these into the next release of Zotero. Oh, and check off a box on Wikipedia. :)

kt297 · January 24, 2011

ajlyon - sorry but there seems to be no difference. In fact downloading is taking alot longer.

ajlyon · January 24, 2011

Hmm. That doesn't make much sense to me, since the code is essentially the same. The difference for saving from Pubmed search results is one function call.

The real performance boost will come if and when the shift to transactions happens; this is a change more for flexibility than anything else.

ajlyon · January 27, 2011

After some more testing, I've committed the revised translator to the repository. As Dan notes, performance gains are only expected with the resolution of ticket #1762.