Firefox freezes after downloading multiple citation from Pubmed
Hi, have recently started using Zotero and I am very impressed so far.
I have noticed that when selecting multiple boxes when downloading citations from pubmed it can take a long time and can crash Firefox! Its ok for upto 14 citations at the same time (takes 21 seconds). I would like the ability to download upto 200-300 citation from pubmed in one go more quickly and without crashes. I've downloaded the stable Zotero version (2.0.9).
I have noticed that when selecting multiple boxes when downloading citations from pubmed it can take a long time and can crash Firefox! Its ok for upto 14 citations at the same time (takes 21 seconds). I would like the ability to download upto 200-300 citation from pubmed in one go more quickly and without crashes. I've downloaded the stable Zotero version (2.0.9).
But Zotero is designed to automate what you might do by hand—it isn't really intended to be a bot downloading 200-300 references at a time. Doing so could easily freeze the UI, at least for a long time. Also, bear in mind that Zotero does more than screen scraping of what you've already loaded—for many sites it needs to make multiple web requests for each item, and it does no throttling or delaying, so with that many items you're essentially carrying out a denial of service attack against the site, and a site would be well justified in banning your IP address temporarily after such an access pattern. (Google Scholar will do just that, and other sites could do the same.)
I think i mean firefox freezes (says browser not responding and greys out). I close the browser then start up firefox again and zotero seems to have saved some citations but not all.
Comparing zotero to Endnote X which I also have (I am thinking to move over to zotero completely as I like its philosophy and its software) Endnote can connect to the Pubmed website and download 100 references in about 10 seconds without freezes.
Sorry if I sound like a nuby but I am just an enduser wanting to use a free, efficient referencing tool. The reason I would like to download large numbers of citations is that I need to use them when I'm offline. Its useful when writing a paper on a particular topic to download a particular search name in pubmed (www.ncbi.nlm.nih.gov/pubmed) and all its citations so that I can use them offline (and search them) when I donot have internet access.
I suspect that Endote and zotero are quite different in the backend and do some things differently. As a non technical enduser(Orthopaedic Surgeon) all I want is quick download of large number of citations for use later offline for writing and referencing of papers in a wordprocessor.
Any other suggestions?
The translator only makes one request, since it concatenates the PMIDs and requests a file with all the corresponding records, which is pretty much what NCBI recommends (http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchlit_help.html#RecordIdentifier).
Medium sized imports tends to be pretty slow in Zotero, no? Any particular reason why? This is just data, no attachments of any kind.
Saving 50 items via Zotero takes 40 seconds for me. That would be a reasonable speed for a translator that made multiple requests (in which case, again, it would also be somewhat abusive to the site), but PubMed should be faster since there's only a single request. Judging by the debug output, we could speed this up by performing the import in a single transaction—right now it's using multiple transactions, which greatly increases disk access. A single transaction would have the side effect of making the save all-or-nothing, so an error with one would result in a failure of all. I believe regular imports use a single transaction already.
It would also be pretty easy (very easy!) to make the Pubmed translator into an import translator as well, so it could just call itself to import the result XML from the website, if that would give us the mentioned performance advantage. That would also let us one tick off one more cell in Wikipedia's table comparing reference management software.
If we decide to change this for translator-based saving, though, it'd likely be a trivial change.
Several times a week, using the PubMed interface (not-Zotero) I select several hundred article records and download them as an xml file. This takes about 2-3 seconds. I then import the xml into my own mySQL database. Even though my translator does extra processing (converting all caps titles and authors to lower case and marking them for extra attention, stripping the beginning and ending straight brackets for titles translated from non-English and dealing with the end of sentence punctuation that may arise when there irregularities, checking for duplicates in the database, linking the comment in / comment on records, and a few other things) a 200 item import to MySQL requires less than 30 seconds.
I doubt if my own PubMed xml translator would be useful but I'm willing to share it if any of you developers think that it might be helpful.
We don't really need further posts on this until someone gets around to testing translator-based saving within a single transaction.
Discussion of a possible PubMed XML import translator can take place elsewhere.
(Simon reminded me that there's a technical reason for this—compatibility with faulty translators—but we're going to try to come up with a workaround.)
The version for 2.1 betas and standalone: https://github.com/ajlyon/zotero-bits/raw/master/NCBI PubMed.js
For 2.0.9: https://github.com/ajlyon/zotero-bits/raw/master/NCBI PubMed.js.2.0.9 (for 2.0.9, you'll need to delete the .2.0.9 suffix)
These can be saved in the translators directory of your Zotero data directory.
Please test, so we can get these into the next release of Zotero. Oh, and check off a box on Wikipedia. :)
The real performance boost will come if and when the shift to transactions happens; this is a change more for flexibility than anything else.