Syncing large collections

2
  • edited March 6, 2010
    [edit: I misread erazlogo's comments!]

    Hold on-- it seems that the server side is getting a lot of attention these days, and the Zotero team is slowly adding support for larger and larger databases.
  • That's correct. We're closing in on getting erazlogo's library synced.
  • Good news! The thing is, I have fewer than 4000 items. The problem, I'm guessing, is that almost every item has a PDF or a screenshot attached, as well as multiple tags. I tried to hit + to expand the items in order to count everything, and 30 minutes of little swirling pie later, I gave up. So is it possible that the "Zotero can't sync libraries this large" is due to the PDFs?

    (Also, I bought a lot of extra storage from Zotero, thinking that would help with this problem, but I think Dan told me that the storage has nothing to do with that...so I'm not sure what the storage is for? Should I be saving my PDFs there instead of on my hard drive?)
    Thanks,
    MM
  • I have a large library as well, with a storage folder of 1.5GB and 38,000 files...
    It hasn't sync'd since 22FEB2010.... same issue?
    Should I go back to jungledisk webdav for the moment?
    Debug output under D1190164621
  • One reset and half a day later, I reconciled a few hundred conflicts, and the sync completed! No further problems. Patience with your overburdened server was the key...Zotero rocks!
  • No luck here. Report ID 1870795938

    [JavaScript Error: "[Exception... "'Error processing uploaded data (Report ID: e10b1d18)' when calling method: [nsIDOMEventListener::handleEvent]" nsresult: "0x8057001e (NS_ERROR_XPC_JS_THREW_STRING)" location: "" data: no]"]
  • breckenr: Your library is, unfortunately, in the very small percentage of very large libraries that are larger than we currently support.

    (The reason the error message changed for you is that we worked around the previous third-party bug that had been preventing large uploads, but your DB is still too large to process. I've now restored a proper error message for such databases.)

    Most people who posted to this thread should currently be able to sync, though your upload might remain queued on the server for a while. A few people will need to wait a bit longer while we work to support very large libraries (and reduce queuing for everybody). Thanks for your patience.
  • Any news on syncing very large collections?
  • Here's what finally worked for me:

    It's been several months since I was last able to completely sync my database (2.0 didn't magically work for me), so now that I've finally hacked through a solution, I'd like to share it. I'm posting here since the last error I got was "'Databases of this size cannot yet be synced. Please check back soon." I have 3446 articles in my personal library, and 3836 in my primary group library.

    Whenever I tried to sync, I had a major problem: After reconciling the conflicts (usually 40 to 100), I got an unending stream of tag notices--not errors, but notices. I described the problem here: http://forums.zotero.org/discussion/10169/sync-error-report-id-2082003221-unending-tag-message-boxes/

    I did not think that the two issues (endless tags and inability to successfully sync) were related until I read this thread, and noted Dan's comment from above: "The error message is a bit misleading—it can also happen if you have a lot of tags, authors, etc."

    Aha! I knew that I had a lot of tags, which were almost all useless junk imported from gazillions of database keywords. So, I searched around and found this thread about removing tags: http://forums.zotero.org/discussion/4051/remove-all-tags/

    Unfortunately, there's no easy way in Zotero to remove multiple tags; the functionality is not built in (yet). Thus, I went the hack way as described by lmullen in the thread: First I disabled the automatic addition of tags (described by Rintze in the thread). Then I backed up Zotero, I downloaded an SQLite browser (I obviously could not use the Firefox SQLite Manager extension), and deleted all tags in the Zotero database (for those who want to save user-generated tags, there are some ideas for this in the thread; I didn't bother--I zapped 'em all). I saved the database and exited, restarted Firefox, and synced. Bingo! First successful sync in several months.

    In summary, the culprit to my problem was having too many tags for sync to work properly; this was caused by automatically adding tags from articles that I added to the database. The solution was to delete the tags directly from the SQLite database, and then disable automatic tag addition so that it doesn't happen again.

    I hope this helps someone here who was as desperate as I was--I love Zotero, but it had reached the point of being unusable. Now I can love it again :-) If only the duplicate handling could be completed, then Zotero would be perfect for me :-)
  • My problem is probably many (unstandardised) authors -- Not a simple thing to fix, even with something like SQLite manager.
  • edited April 26, 2010
    I am in a similar spot as breckenr - I have not been able to sync my database for several months. I used to be able to sync to an non-zotero WebDAV but now I receive the error "Databases of this size cannot yet be synced. Please check back soon. (Report ID: 1fa434d2)" Report Error 366338323.

    My database is only about 6,000 citations (3.2GB). I have deleted all tags, but there is likely a few references with numerous and non-standard authors. If I thought it would help, I could manually delete the references with large numbers of authors, but it doesn't seem worth the effort - unless I knew for sure that was the problem...
  • I have syncing working now. What I did was to split away a big chunk of the material, mostly derived from an earlier Access database. This left me with some 5500 records (including lots of long notes). On syncing this database I found that the new sqlite file (on the synced machines) was very dramatically smaller than the original database, some 17mb compared to over 80mb. The rump data is also a lot smaller. This means that I have two zotero databases but with separate Firefox profiles for each of them it works quite efficiently. Zotero is also _much_ faster this way.
  • I have been unable to sync my database -- "unable to sync databases this size yet..." (report ID 427343708). Any thoughts on when this may be resolved?
  • Since my database is not that large (6,000 citations, zotero.sqlite = 140kb), and based on what Dan said (Feb 5, 2010), I am suspecting that there could be some troublesome citations in my database that are causing my "Database of this size cannot yet be synced" error.

    Any suggestions on how I could troubleshoot this? I have already deleted all tags, and I am not keen on breaking my database across multiple firefox profiles, as this would be impractical to go back and forth between them.

    I have tried generating an error log (preferences>advanced>debug output logging) when trying to sync, but the browser locks up briefly and the log file jumps to >350,000 lines written when it unfreezes (making it too large to open...).

    Does it make any difference where my webdav is located (ie zotero webdav vs other)?
    I have tested the integrity of the database (preferences>advanced>Database maintenance), and no errors are found.
    -firefox3.6.3, zotero2.0.3
    Any suggestions appreciated.
  • Is there a thread or wishlist forum to follow the status of "large databases" becoming usable. This is my show stopper to getting my whole lab using Z. I would like to know as soon as the problem is resolved.

    Thanks.
  • Anyone get any further with this? I am still stuck without being able to sync (although I am paying for zotero server space). What are the keys to keeping the database clean enough to sync - I don't think it is the overall size that is the problem, and I have removed all tags. (Report ID:6908422)
    Thanks for any help.
  • sjimon: Your database size itself isn't the problem, but rather that you're uploading a large chunk of data (> 2,000 items with creators and tags) at once. There's not much you can do at the moment, since that upload is much larger than we currently allow, but we're in the process of rewriting the sync architecture in a way that will enable us to allow such large uploads. (Currently, due to locking problems we've run into, large uploads can essentially block all others.) We should be rolling that out in the very near future.

    And since this is a hard-coded limit that's keeping you from syncing, once you get it working, send a message to storage@digitalscholar.org and your storage subscription can be extended.
  • @ Dan: is it possible to do just partial sync, i.e. syncing just certain sections of the database each time?
  • Not once you have a library in place. Be patient a little bit longer—improvements really are forthcoming.
  • Sure, I will wait for your updates! Zotero is still the best. I just tried Mendely and removed from my desktop after 10 minutes...
  • Dan, is this version 2.0.7 going to help syncing large libraries?
  • Does it make a difference for syncing to have one big group library with e.g. 100 collections or to have e.g. 5 group libraries with only 20 collections but in summary the same number of references?
  • Still not able to sync :-(
    But happy to know you are working on it :-)

    Report ID: 1542289418

    [JavaScript Error: "[Exception... "'Databases of this size cannot yet be synced. Please check back soon. (Report ID: 2b919136)' when calling method: [nsIDOMEventListener::handleEvent]" nsresult: "0x8057001e (NS_ERROR_XPC_JS_THREW_STRING)" location: "<unknown>" data: no]"]
  • Here's a temporary workaround if your database is less than 2GB: sign up for https://www.dropbox.com/ free account; set your data directory to a 'Zotero' folder you created inside your Dropbox folder; disable the native Zotero syncing. Voila: as long as your database stays under 2GB and you work on your own computers (where Dropbox is installed on each computer), syncing is completely transparent.
  • Dropbox is an option, but there is a very serious risk of data corruption if you use it or any other synchronization program on the entire data folder. If you do use Dropbox, you must make sure that Firefox has been closed and the folder fully synchronized before opening Firefox on another computer that is trying to use the same database. (Essentially the same thing applies for putting the data directory on a shared/network drive.) Simultaneous access by multiple Firefox instances can break your database.
  • As ajlyon explained, Zotero on Dropbox is not an option for me and my research team. We do use a Dropbox shared folder, but this is absolutely a non-solution for multiple users. The problem is not merely a matter of avoiding using Zotero at the same time; we would have to avoid using Firefox at the same time (since it loads Zotero, whether we're using it or not). This is obviously a non-solution for us.

    That said, for people who are the single users of Dropbox and only need asynchronous access from multiple computers, I agree with gerhard221 that Dropbox is an excellent solution, probably better than the Zotero sync mechanism. Unfortunately, that's not my situation.
  • Hi Dan, as recently as September 8 you said:

    > Be patient a little bit longer—improvements really are forthcoming.

    I posted earlier how deleting tags had solved my sync problems, but I have since added another large project, and now I can't sync again (I have at least three distinct libraries with over 5,000 citations each); maybe my problem now is too many authors.

    Any update on when large library syncs will be fixed? This is a real show-stopper for me, since the nature of my research (systematic literature reviews) necessarily involves huge numbers of citations.
  • Any update on syncing large databases? I'm moving to a new project and am considering archiving most records from my previous project to trim down the DB. I currently have 41809 items; zotero.sqlite is 666.4 MB; my storage folder is 21.56 GB. I could probably cut the space/number of items by 1/3. If I do that, any chance of syncing right now?
  • Try it. The limit on uploads was removed a couple months ago, though there could still be issues in various places with very large databases.
Sign In or Register to comment.