sync issues, sync information, sync remedies?

reveley · August 17, 2010

The fact is: your server is overloaded. It takes hours to sync even small databases with no files barely kilobytes of data.

Each internal SQL query is apparently turned into a post request which is then sent. Each query apparently relates to one item in the database (i.e. one article). I'm guessing here, but it looks like that to me.

Now, each of these is queued for longer and longer times as the server gets slower, to the point where adding an item of a few thousand bytes might get through in 120000 ms, or it might get queued for that long again. (that's two minutes per item to you and me)

The queue time seems to increase.

Now, here are some things I need to know:

1) what happens if my sync is interrupted?
2) why can you not simply advertise how busy your server is?
3) why is "auto sync" on automatically?

that last one seems *nuts* to me. It looks like a DOS situation. People are adults here. They can sync when they want. If they can't figure that out, they wouldn't be using this software. Presumably.

At least have it "off" by default.

Also: I of course have chosen the webDAV option for my storage of files. I don't have the option of my own zotero server (although I don't see why not).

Is my mandatory use of the zotero sync server for metadata being stymied by other people using your bandwidth for large files?

That seems unfair.

What is otherwise an *almost* perfect product (in concept certainly) is being rendered unusable by what may be avoidable network congestion, and by the fact that I have to guess what it's doing by looking at it's debugging output.

Why not just display the data properly to the user? Most of us are scientists.

reveley · August 17, 2010

UPDATE: after one hour, it sent over a block of XML to zotero.org, and it's easy to see that that is one of my folders in "my library".

I have ten. They have a mean number of articles about 15. so 150 articles. Just the metadata mind, no files. Could it amount to one megabyte of data? I doubt it. There are abstracts. But 1MB of 16 bit characters is a lot of text.

And ten hours appears to be my projected upload time for that.

noksagt · August 17, 2010

So syncing does not apparently work well for you. This is frustrating, but there's little reason to make accusations (especially when it does work for some of the rest of us).

Zotero syncing does not work in the way you described. Metadata syncing is atomic: it is all-or-nothing. Auto-syncing is of great benefit to those users where syncing consistently works. The Zotero server code is released & others have set it up to run. It is just not supported. I personally think it is fair that the Zotero developers get to choose how they prioritize their time and resources (among other things, to fulfill promises to funding agencies).

ajlyon · August 17, 2010

Please note as well that syncing is working. Really. It does not keep multiple Zotero clients in perfect sync instantaneously, and sometimes it can take some time for changes to propagate. That said, they do propagate. Changes that occur while a sync is in progress are collected locally and sent to the metadata server as a single operation. In many cases, I feel like people would be better off if there was no sync indicator, since people become unduly concerned with whether it is spinning. If the last sync was less than an hour ago, don't worry. All is working fine. Queued syncs don't slow you down, worrying about them does.

reveley · August 17, 2010

No that isn't right. It *does* work for me - perfectly.

But only when the server isn't busy (which I'm measuring by the server queue re-try responses).

I *haven't* made any accusations; I've made feature requests. In fact I said used the word "perfect" w.r.t the software. The primary request I made is that it advertise the server load, and advertise what is happening during a sync more clearly (not necessarily techno-speak).

I think the software should be clearer about what it is doing, so that people feel in control of it (without opening the source code)

Auto-syncing is not a feature I suggested being removed; I suggested that it be off by *default* to avoid congestion issues.

That's not a silly suggestion.

Very many small queries to a server looks to me like a DOS attack. Like I said.

ajlyon · August 17, 2010

The debug output and time since last sync are quite informative. Most people do not need to know which items are being synced. Perhaps the size of the queue could be published-- maybe the Zotero.org team would be amiable to offering such information if someone worked out how to do it using the released server code.

reveley · August 17, 2010

ajilon - I'm not actually syncing. I deleted from the server, and I'm up-loading de-novo from my local repository.

Absolutely we need an indicator, and more information about what's happening?

why? because I need to access this data from another part of the country. And I'm going to miss my train because I expected it to work the way it did in the past!

It's not a complaint, still less an "accusation" as some would have it (???)

It's a *problem*. If it's network congestion, I'll pay for bandwidth happily (rather than storage).

If it's an issue of design, I'd respectfully suggest it be thought about (I agree certainly that it should be assembled locally and sent atomically however).

But not worrying about it isn't always an option for busy poeple. And *no* indicator and *no* choice about whether and when to sync to a repository would leave me, and I think many of my colleagues, feeling helpless as lambs, and maybe looking for other offcampus solutions.

Zotero remains the best I've tracked down so far.

kathryn.duffy · August 17, 2010

I have to agree with reveley. My research lab is currently investigating possible software packages for a collaborative reference database. Zotero, in theory, has ALL of the features we were hoping to get. We'd certainly be willing to pay for the service in terms of storage and/or bandwidth.

I presented Zotero this afternoon to the lab, showed them how to set up an account and *tried* to demo the group library. Unfortunately, the one metadata-only item I had in our new group library still hasn't uploaded to the server (2 hours later). Perhaps we just had the misfortune of trying it out today, but I'm now going to have a very hard time proposing we use Zotero instead of using another software package or developing on in-house that we release to other research institutes, with reveley's suggestion of providing the option of setting up your own zotero server.

Please don't take this as an attack on Zotero - I would LOVE to use and promote Zotero. However, like reveley has stated, we either need information about what is going on with sync in order to make an informed decision (right now it's that Zotero's server is too bogged down in general or is simply unreliable) or approve a request to remedy the sync server issues.

noksagt · August 17, 2010

I *haven't* made any accusations

By "accusations", I merely meant that you seemed to be mis-diagnosing the issue & then describing design decisions as "nuts" and "unfair".

Auto-syncing is not a feature I suggested being removed; I suggested that it be off by *default* to avoid congestion issues.

But not very many users would want it off. Those who did can easily turn it off.

I'm not actually syncing. I deleted from the server, and I'm up-loading de-novo from my local repository.

Why did you do this? Unless you are debugging something in particular, this is a waste of time. You have a much larger sync operation than would have to be performed compared to if you just had a few records to add/update on the server. Since syncs are all-or-nothing, your sync should be of no better quality this way (and, factoring in the time, leaves you in a worse position overall).

ajlyon · August 17, 2010

The delays in sync processing are certainly a problem in some cases, and you have mentioned the big ones. In light of its limited resources, Zotero.org has chosen to sacrifice immediacy in return for increased scalability-- they've chosen a good-enough solution that should in most cases mean that libraries are no more than a few items out of sync.

This good-enough solution may not be good enough for (1) first-time and post-reset synchronization, (2) demonstrations of syncing and (3) close collaboration by users who share a group or personal library. All three of these are fairly rare use cases, but they are high-visibility ones. I don't know what Zotero can do to help here; a fast lane for paid users might be possible, but the team almost certainly doesn't have the developer time to create such a system, and it might be a little contentious for the user community.

I hope that institutions and individuals with an interest in improving the sync experience will start to work with the server code and explore avenues for improvement.

ajlyon · August 17, 2010

Would it help if sync status information were available through Zotero.org? I know that Dan can often tell people what has been recently synced and whether they have syncs queued-- if such information were available to users directly, it might help.

Another question is when it is safe to disconnect from the network, which reveley raised above. Can one of the Zotero developers explain? Specifically, if Zotero tells me that it is waiting for updated data or is otherwise queued, can I disconnect without the sync failing?

reveley · August 17, 2010

noksagt - you cannot assert that most users would or would not want a feature on or off.

you need *data*. You cite none.

my descrptions of design choices were based on guesses and I *explicitly said so*.

I also said *explicitly* that I would like the software to include this information, so I do not have to guess.

that was my purpose in corresponding.

I'm a busy scientist. I don't like to guess. I don't like to spend time not doing my job. I just don't have time.

And I don't make accusations in public.

if you have time on your hands, why not write a python script to search this forum for a count of posts with the word "sync" in them, and determine the ratio to the total.

dstillman · August 17, 2010

reveley: So, for what it's worth, and as has been more or less pointed out above, basically all of your assumptions for how syncing works (SQL statements, increasing retry times, auto-sync, "network congestion") are mistaken. There's of course no reason or need for you to know how it works, but there's also not much reason to suggest remedies based on quite erroneous assumptions.

SQL statements in debug output are local operations and have nothing to do with the server side of syncing.

The increasing retry times you see in the debug output are by design. They follow a server-dictated back-off schedule and don't indicate that the sync servers are slowing down—they're what prevent DOSing of the web servers when the sync servers are busy.

Auto-sync is in fact extremely beneficial to sync performance. If syncing consisted solely of small auto-synced operations, it would be more or less instantaneous for everyone. Large operations are what slow it down, since, as noted above, sync transactions are atomic, and transactional lock contention is currently by far the biggest detriment to sync performance.

Sync performance has nothing to do with network congestion. We have plenty of bandwidth.

And as noksagt noted, you would've made things much worse for yourself by clearing server data and resycing, if that's in fact what you did.

We've made more or less constant improvements to the sync architecture to handle increasing server loads. As I've said before, on busy days queue times peak at about one hour for the longest (and largest) sync operations in the middle of the day U.S. time. Most syncs take much less time, and for most of the day the vast majority of operations should take only seconds.

Overall queue time aside, I would say the main practical problem at the moment is that, while a sync is queued, any local changes since that sync was sent up aren't sent to the server, which is a real problem for people who want to, say, catch a train. So while it would be somewhat detrimental to server performance and cause an overall delay in a person's data showing up on the server, it may be preferable to have new local changes revoke queued uploads that haven't yet entered the processing queue. That way a user could know that all changes they've made so far have been sent up and shut down their computer and head to the train station. (It's not 100% certain that queued uploads will go through successfully, but nearly all potential errors are caught within a few seconds by an error preprocessor and returned to the user.)

To be entirely clear, we would still like to (and intend to) improve sync performance dramatically, particularly for the high-visibility cases that ajlyon describes, and I'm hoping to do so in the next few weeks (particularly since the school year is starting).

In the meantime, we might be able to provide some additional sync status information, but it's not entirely clear to me what would be useful. Users can already see what has synced by checking the website. Individual queuing state might be helpful, but only for the fairly rare cases where the sync icon is spinning without a queued request on the server—which is a bug if it happens. Current size of the queue and maximum wait time would give some indication as to overall server load, but they could also be misleading, as the queue is processed from a number of directions, which is why Zotero doesn't display any sort of ETC for syncs. (We could try to, but I'm not sure how accurate we could get it.) And there's not a whole lot a user can do with that information.

But some architecture improvements to reduce lock contention and potentially the queuing change I describe above are the things that would make the biggest difference for users.

reveley · August 18, 2010

Dan I made guesses, not assumptions. Not every conjecture I made was incorrect; the sync was taking a long time because the server was busy (due to locks, traffic - does it matter to users?).

The sync status information you suggest would be useful. It's never bad to give information to users, but that does not mean it has to on by *default*. atm there's a choice between: debgging output, or *nothing*. no sense is in that.

Finally, as I say, syncing is the deal breaker for zotero.

contrary to what you say, it is a system of client side SQLLite, some bigger relational system on the server side, and XML.

How *exactly* you've chosen to do it is a problem, because performance is poor for tiny tiny amounts of data, and no one knows why.

syncing things is hard. I had to make a computer/palm pilot sync system many years ago for a job interview. I didn't get the job.

But you need to make this work.

It doesn't work very well Dan. Sorry. The forum is chock full of users saying the same thing.

That's what people say about your software product, in a variety of timezones.

I cleared the system because it's buggy, as well as slow btw. it repeats items on sync.

noksagt · August 18, 2010

I cleared the system because it's buggy, as well as slow btw. it repeats items on sync.

Syncing should not create duplicates. As far as I know, nearly all reports of duplicates being generated on sync have been because there really were two separate records that have two different keys corresponding to one reference. This may happen if you have duplicates in a single zotero library or it could happen if you sync from two or more machines that have different keys for the same items. The latter can happen if you were merging two or more somewhat different libraries that were gathered manually or if you had been using some manual method to sync (particularly import/export) or if you had copied the zotero 1 data directory to multiple machines, upgraded to zotero 2, and then synced. If none of these describes your situation, it is very unusual & should be looked into.