sync issues, sync information, sync remedies?
The fact is: your server is overloaded. It takes hours to sync even small databases with no files barely kilobytes of data.
Each internal SQL query is apparently turned into a post request which is then sent. Each query apparently relates to one item in the database (i.e. one article). I'm guessing here, but it looks like that to me.
Now, each of these is queued for longer and longer times as the server gets slower, to the point where adding an item of a few thousand bytes might get through in 120000 ms, or it might get queued for that long again. (that's two minutes per item to you and me)
The queue time seems to increase.
Now, here are some things I need to know:
1) what happens if my sync is interrupted?
2) why can you not simply advertise how busy your server is?
3) why is "auto sync" on automatically?
that last one seems *nuts* to me. It looks like a DOS situation. People are adults here. They can sync when they want. If they can't figure that out, they wouldn't be using this software. Presumably.
At least have it "off" by default.
Also: I of course have chosen the webDAV option for my storage of files. I don't have the option of my own zotero server (although I don't see why not).
Is my mandatory use of the zotero sync server for metadata being stymied by other people using your bandwidth for large files?
That seems unfair.
What is otherwise an *almost* perfect product (in concept certainly) is being rendered unusable by what may be avoidable network congestion, and by the fact that I have to guess what it's doing by looking at it's debugging output.
Why not just display the data properly to the user? Most of us are scientists.
Each internal SQL query is apparently turned into a post request which is then sent. Each query apparently relates to one item in the database (i.e. one article). I'm guessing here, but it looks like that to me.
Now, each of these is queued for longer and longer times as the server gets slower, to the point where adding an item of a few thousand bytes might get through in 120000 ms, or it might get queued for that long again. (that's two minutes per item to you and me)
The queue time seems to increase.
Now, here are some things I need to know:
1) what happens if my sync is interrupted?
2) why can you not simply advertise how busy your server is?
3) why is "auto sync" on automatically?
that last one seems *nuts* to me. It looks like a DOS situation. People are adults here. They can sync when they want. If they can't figure that out, they wouldn't be using this software. Presumably.
At least have it "off" by default.
Also: I of course have chosen the webDAV option for my storage of files. I don't have the option of my own zotero server (although I don't see why not).
Is my mandatory use of the zotero sync server for metadata being stymied by other people using your bandwidth for large files?
That seems unfair.
What is otherwise an *almost* perfect product (in concept certainly) is being rendered unusable by what may be avoidable network congestion, and by the fact that I have to guess what it's doing by looking at it's debugging output.
Why not just display the data properly to the user? Most of us are scientists.
This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
I have ten. They have a mean number of articles about 15. so 150 articles. Just the metadata mind, no files. Could it amount to one megabyte of data? I doubt it. There are abstracts. But 1MB of 16 bit characters is a lot of text.
And ten hours appears to be my projected upload time for that.
Zotero syncing does not work in the way you described. Metadata syncing is atomic: it is all-or-nothing. Auto-syncing is of great benefit to those users where syncing consistently works. The Zotero server code is released & others have set it up to run. It is just not supported. I personally think it is fair that the Zotero developers get to choose how they prioritize their time and resources (among other things, to fulfill promises to funding agencies).
But only when the server isn't busy (which I'm measuring by the server queue re-try responses).
I *haven't* made any accusations; I've made feature requests. In fact I said used the word "perfect" w.r.t the software. The primary request I made is that it advertise the server load, and advertise what is happening during a sync more clearly (not necessarily techno-speak).
I think the software should be clearer about what it is doing, so that people feel in control of it (without opening the source code)
Auto-syncing is not a feature I suggested being removed; I suggested that it be off by *default* to avoid congestion issues.
That's not a silly suggestion.
Very many small queries to a server looks to me like a DOS attack. Like I said.
Absolutely we need an indicator, and more information about what's happening?
why? because I need to access this data from another part of the country. And I'm going to miss my train because I expected it to work the way it did in the past!
It's not a complaint, still less an "accusation" as some would have it (???)
It's a *problem*. If it's network congestion, I'll pay for bandwidth happily (rather than storage).
If it's an issue of design, I'd respectfully suggest it be thought about (I agree certainly that it should be assembled locally and sent atomically however).
But not worrying about it isn't always an option for busy poeple. And *no* indicator and *no* choice about whether and when to sync to a repository would leave me, and I think many of my colleagues, feeling helpless as lambs, and maybe looking for other offcampus solutions.
Zotero remains the best I've tracked down so far.
I presented Zotero this afternoon to the lab, showed them how to set up an account and *tried* to demo the group library. Unfortunately, the one metadata-only item I had in our new group library still hasn't uploaded to the server (2 hours later). Perhaps we just had the misfortune of trying it out today, but I'm now going to have a very hard time proposing we use Zotero instead of using another software package or developing on in-house that we release to other research institutes, with reveley's suggestion of providing the option of setting up your own zotero server.
Please don't take this as an attack on Zotero - I would LOVE to use and promote Zotero. However, like reveley has stated, we either need information about what is going on with sync in order to make an informed decision (right now it's that Zotero's server is too bogged down in general or is simply unreliable) or approve a request to remedy the sync server issues.
This good-enough solution may not be good enough for (1) first-time and post-reset synchronization, (2) demonstrations of syncing and (3) close collaboration by users who share a group or personal library. All three of these are fairly rare use cases, but they are high-visibility ones. I don't know what Zotero can do to help here; a fast lane for paid users might be possible, but the team almost certainly doesn't have the developer time to create such a system, and it might be a little contentious for the user community.
I hope that institutions and individuals with an interest in improving the sync experience will start to work with the server code and explore avenues for improvement.
Another question is when it is safe to disconnect from the network, which reveley raised above. Can one of the Zotero developers explain? Specifically, if Zotero tells me that it is waiting for updated data or is otherwise queued, can I disconnect without the sync failing?
you need *data*. You cite none.
my descrptions of design choices were based on guesses and I *explicitly said so*.
I also said *explicitly* that I would like the software to include this information, so I do not have to guess.
that was my purpose in corresponding.
I'm a busy scientist. I don't like to guess. I don't like to spend time not doing my job. I just don't have time.
And I don't make accusations in public.
if you have time on your hands, why not write a python script to search this forum for a count of posts with the word "sync" in them, and determine the ratio to the total.
SQL statements in debug output are local operations and have nothing to do with the server side of syncing.
The increasing retry times you see in the debug output are by design. They follow a server-dictated back-off schedule and don't indicate that the sync servers are slowing down—they're what prevent DOSing of the web servers when the sync servers are busy.
Auto-sync is in fact extremely beneficial to sync performance. If syncing consisted solely of small auto-synced operations, it would be more or less instantaneous for everyone. Large operations are what slow it down, since, as noted above, sync transactions are atomic, and transactional lock contention is currently by far the biggest detriment to sync performance.
Sync performance has nothing to do with network congestion. We have plenty of bandwidth.
And as noksagt noted, you would've made things much worse for yourself by clearing server data and resycing, if that's in fact what you did.
We've made more or less constant improvements to the sync architecture to handle increasing server loads. As I've said before, on busy days queue times peak at about one hour for the longest (and largest) sync operations in the middle of the day U.S. time. Most syncs take much less time, and for most of the day the vast majority of operations should take only seconds.
Overall queue time aside, I would say the main practical problem at the moment is that, while a sync is queued, any local changes since that sync was sent up aren't sent to the server, which is a real problem for people who want to, say, catch a train. So while it would be somewhat detrimental to server performance and cause an overall delay in a person's data showing up on the server, it may be preferable to have new local changes revoke queued uploads that haven't yet entered the processing queue. That way a user could know that all changes they've made so far have been sent up and shut down their computer and head to the train station. (It's not 100% certain that queued uploads will go through successfully, but nearly all potential errors are caught within a few seconds by an error preprocessor and returned to the user.)
To be entirely clear, we would still like to (and intend to) improve sync performance dramatically, particularly for the high-visibility cases that ajlyon describes, and I'm hoping to do so in the next few weeks (particularly since the school year is starting).
In the meantime, we might be able to provide some additional sync status information, but it's not entirely clear to me what would be useful. Users can already see what has synced by checking the website. Individual queuing state might be helpful, but only for the fairly rare cases where the sync icon is spinning without a queued request on the server—which is a bug if it happens. Current size of the queue and maximum wait time would give some indication as to overall server load, but they could also be misleading, as the queue is processed from a number of directions, which is why Zotero doesn't display any sort of ETC for syncs. (We could try to, but I'm not sure how accurate we could get it.) And there's not a whole lot a user can do with that information.
But some architecture improvements to reduce lock contention and potentially the queuing change I describe above are the things that would make the biggest difference for users.
The sync status information you suggest would be useful. It's never bad to give information to users, but that does not mean it has to on by *default*. atm there's a choice between: debgging output, or *nothing*. no sense is in that.
Finally, as I say, syncing is the deal breaker for zotero.
contrary to what you say, it is a system of client side SQLLite, some bigger relational system on the server side, and XML.
How *exactly* you've chosen to do it is a problem, because performance is poor for tiny tiny amounts of data, and no one knows why.
syncing things is hard. I had to make a computer/palm pilot sync system many years ago for a job interview. I didn't get the job.
But you need to make this work.
It doesn't work very well Dan. Sorry. The forum is chock full of users saying the same thing.
That's what people say about your software product, in a variety of timezones.
I cleared the system because it's buggy, as well as slow btw. it repeats items on sync.