Crash when loading really large author list

Hi,

Not sure if this is a beta issue or not. I tried adding the paper here:
http://arxiv.org/abs/0901.0512

When adding it, firefox seemed to freeze. After a long time, I force quit it. However, the entry has been entered into my database. Any time I click on it, the program seems to freeze again. I'm guessing the problem is that there are over 2000 authors on the paper, and that this is causing something to overflow.

Maybe it isn't crashing. Maybe clicking on it just starts a load, and it takes a very long time. However, either way its unusable.

I can read the author list online, from my synced account information. It seems to have uploaded at least a large portion of the author list.

I'm running on the latest firefox and Ubuntu 9.04.

Any ideas what to do? Can I at least delete the item somehow from my database so I don't accidental click it and freeze anything? I don't know how to delete it without clicking on it.
  • Your system isn't frozen, it's just busy. I clicked on the icon and went to bed, the item was in the library when I woke up. I clicked on the item and the system went to 100% CPU, so I went away and had my morning coffee. When I returned 20 minutes later, the item was open. Right click, delete, item gone, back to normal.
  • Okay, thanks, I can do that. But is there any way to add this item in a useful way? Is this considered a bug, or will large author lists always be a problem? As someone in High Energy Physics, we often have large author lists on our papers, since we work in very large collaborations.
  • You'll have to work that out with the Zotero developers. Presumably there would either need to be a means of clobbering author names beyond a certain limit, or (if you need to maintain that info somewhere) store them separately, but supply only a limited set to the UI and the CSL processor.
  • I've created a ticket for the UI display. (At least on my machine, the actual saving of that item to the database only takes about a minute.)

    What's the maximum number of authors that would ever need to be displayed in a bibliography?
  • For my fairly fast computer at home it takes about 10 minutes.

    By displayed in a bibliography, do you mean what would show up on a web site, or what I'd want to get out of zotero when making a bibliography on a paper of mine? I'm not sure about the first. As for the second, I believe when citing such papers as this its usually okay to just cite as "The ATLAS Collaboration" with a single author name and "et al". On the spires website the Bibtex entry it provides for this paper is:
    @Article{Aad:2009wy,
    author = "Aad, G. and others",
    collaboration = "The ATLAS",
    title = "{Expected Performance of the ATLAS Experiment - Detector,
    Trigger and Physics}",
    year = "2009",
    eprint = "0901.0512",
    archivePrefix = "arXiv",
    primaryClass = "Unknown",
    SLACcitation = "%%CITATION = 0901.0512;%%"
    }

    I think this information would be fine. I don't personally really need all the author info in Zotero. I just would like to be able to automatically add the paper to Zotero, without the fuss its currently causing.

    Thanks
  • For my fairly fast computer at home it takes about 10 minutes.
    An item is selected as soon as it is saved, so you can't necessarily tell how long saving alone takes without looking at debug output (on a non-Windows system).
    By displayed in a bibliography, do you mean what would show up on a web site, or what I'd want to get out of zotero when making a bibliography on a paper of mine?
    The latter.
    As for the second, I believe when citing such papers as this its usually okay to just cite as "The ATLAS Collaboration" with a single author name and "et al".
    OK. But then the question would be a reasonable limit to send to the CSL processor for groups of authors that can't be summarized in such a way. (I'm not saying that you need to answer this personally—it's just a question that will need to be answered before we can implement a solution.)

    Your example, however, suggests that there also needs to be a way to quickly remove all of an item's creators.
  • With 50+ author papers becoming more common, I think it would be preferable to store all the data (people who do paper metadata analysis might see this as a requirement). If the problem is mainly a GUI one, maybe the author listing could also expand and collapse, roughly similar to the current abstract field, e.g.:

    Author: Author 1
    Author: Author 2
    Author: Author 3
    (...) Authors: 97 more...

    or, if the last author is deemed of particular significance:

    Author: Author 1
    Author: Author 2
    Author: Author 3
    (...) Authors: 96 more...
    Author: Author 100

    Not entirely sure though how you'd tell Zotero to collapse the author-list again (maybe right-clicking an author will collapse starting with the next author?).
  • Yes, this is just about GUI and CSL—if the data is there, Zotero will continue to grab it.

    An expand/collapse mechanism similar to the Abstract field is probably necessary for the GUI, though some heavy optimization would be required to support the expanded view with thousands of creators.
  • edited May 13, 2009
    But then the question would be a reasonable limit to send to the CSL processor for groups of authors that can't be summarized in such a way.
    What exactly is the problem here? Is it too time-consuming for Zotero to supply all the authors to the CSL processor, or for the CSL processor to digest the incoming data? I thought it was deemed desirable to attach (complete?) item metadata to the document in the case of OOXML/ODF. Wouldn't that require support to transfer such long author lists from Zotero's database to those documents?
  • edited May 13, 2009
    One of the things I've contemplated adding to the processor is an API command that lists all of the fields that might possibly be required to render any citation in any form for the current style, to permit optimization of database fetches (if we're not going to render the "extra" or the "abstract" field, there's no reason to boost that data into memory). If that were implemented, the response could also indicate the maximum number of authors needed (i.e. max value of et-al-min + 1, so the processor can determine whether the et-al form is needed from the names it does receive).

    Would this be a useful command to implement? (If so, it would be helpful to have a sample of the response object that the processor should deliver.)

    Re transferring all of the author names, that sounds like it really wouldn't scale very well. Zotero currently has a speed issue with large documents, which I have ambitions of seeing solved with the new processor. It would be a shame to trade up complaints from the historians for complaints from the folks in high energy physics. Surely there must be some more efficient way of coping? Can "real" bib data be (transferred only once and) stored in a separate segment from mangled data used for citation rendering, or something?
  • I'll probably back out of this conversation for the most part, since I don't know too much about the internal Zotero workings (although I'll follow it in case I have something useful to add). I just wanted to say thanks for the attention this is getting. You guys have a great product, and great support.

    Cheers,
    Caleb
  • As a note on this, as of December 8 last year, citeproc-js processes only the maximum number of authors set in the style's et-al parameters, plus a buffer value of 2. This has lifted performance of the processor to an acceptable level with 2,000 authors on the input stream. The change was made in response to a fault report from Carles Pina at Mendeley, where one of their users hit this same reference, or one with very similar characteristics.
  • edited August 23, 2011
    Has there been any movement on improving this? (I'll assume that whatever citeproc-js does isn't the problem, because the problem is still there.) This is a huge problem for me; it takes a minute to switch to items by my collaboration of 711 people. And since I need to look at those papers frequently, my work slows to a crawl. For short-author-list papers, everything's as zippy as could be.

    I saved my debug output to id D379914512 (if that helps; it says it uploaded, but I don't know where to find that publicly). I'm not sure how to read the debug output, but I suspect the second number is the milliseconds. If so, then the 711 lines of "Switching lastCreatorFieldMode to 0" add up to 58.24 seconds, which is about the total hang time; so presumably all those switches are causing the problem. Also, the time each one takes increases as you go down the list.(*)

    What really confuses me is that lastCreatorFieldMode is apparently a hidden preference set in about:config. Why does it need to be reset to the same thing 711 times? Is that what's actually slowing it down, or is that just in the loop that's slow?

    Googling the debug message, it looks like maybe this is all coming from the zotero GUI changing the format of the display on each author's name. Is that right? Maybe a preference for maximum number of authors displayed, as Rintze suggested, would fix it?

    Running Zotero 2.1.8 under Firefox 6.0 with Mac OS X 10.6.8 on a recent MacBook Pro.


    (*) Geek alert: time taken per switch = (10 + 0.086*i + 0.00025*i^2) ms, where i is the number of times the switch has been made.
  • We'll see what we can do about this before 3.0 Final.
  • Thanks! (And zotero is awesome, by the way.)

    It might help that I've traced the problem to the 'getPropertyValue' calls in 'chrome/content/zotero/bindings/itembox.xml'. I gather that defaulting to single-field creator names would be a workaround. I don't know how to do that.

    Still, the more I think about it, the more I think a shortened (but expandable) creator list would be best. I tested out limiting the loop over creators and it worked, so that would be alright.


    [To be specific, I replaced

    for (var i = 0, len=this.item.numCreators(); i<len; i++)

    with

    for (var i = 0, len=Math.min(this.item.numCreators(),10); i<len; i++)

    and items that took a minute are just as fast as any other item. Also, that loop could be optimized significantly.]
  • I've implemented creator list limiting (to the first 10 creators, unless there are fewer than 5 more) on the trunk.

    https://www.zotero.org/trac/changeset/10412
  • > I've implemented creator list limiting (to the first 10 creators, unless there are fewer than 5 more) on the trunk.

    Thanks Dan, this was very helpful. However, note that there are still problems with the most recent version of Zotero. If I delete one of the authors on an item with ~2,000 authors (particle physics, of course), Zotero hangs. (I waited about 10 minutes before killing it.) Presumably it was busy trying to load the very long author list so it could find the next author to display in the list of 10.

    Also, when Zotero exports this to a BibTeX file, the very long author list makes BibTeX choke. I know this might be BibTeX's fault, but it would nice to have an option (i.e. along with "Include notes") on the export function that truncates very large author lists. Actually, you might just want to do it automatically with a warning given to the user the first time. After all, if the standard version of BibTeX always chokes on lists above a certain size, there's no use in ever exporting to a .bib file that is doomed.

    Thanks again for the priceless tool!
Sign In or Register to comment.