D1485439288: Zotero integration server locks up when interacting with a rogue client

kjambunathan · August 5, 2020

The Zip File below contains

1. Debug log on the Zotero side.
2. Debug log on the LibreOffice Plugin side.
3. Wireshark packet capture.

[D1485439288.zip](https://github.com/zotero/zotero/files/5026453/D1485439288.zip)

If you look at the Wireshark capture, you will see that the last tcp stream shows that the client is sending a Zotero Refresh but the server is not responding.

```
Transmission Control Protocol, Src Port: 23116, Dst Port: 39806, Seq: 1, Ack: 18, Len: 0

....... "refresh"
```

The fact that the server is seeing the refresh, but refusing to respond--this is a bug--is confirmed by server-side logs

```
(3)(+0000038): LibreOfficePlugin: Read 0 "refresh"

(3)(+0000009): Integration: Request already in progress; not executing OpenOffice refresh
```

Now, look at the preceding TCP packet stream. You will see that the LibreOffice-side client didn't return a document ID for the server's request. (Agreed that LibreOffice-client is acting rogue here). But that doesn't mean that Zotero locks up for all new integration requests.

```
Transmission Control Protocol, Src Port: 39800, Dst Port: 23116, Seq: 18, Ack: 46, Len: 0
....... "refresh".......%["Application_getActiveDocument",[3]]
```

I wonder if there is a way for the Zotero server to recover from locking up for future client requests. (I tried resetting the TCP-connection on the LibreOffice-side. You can see that the handshake happens on the new TCP client port. But that doesn't seem to help)

The client is acting in a rogue-way--I will take this issue separately with Zotero LibreOffice team--but I would expect the server to be able to recover from client's fault and start all over again from clean state.

-----------------------------

```
$ uname -a
Linux debian 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux
```

```
Zotero-5.0.88_linux-x86_64.tar
```

```
Zotero_OpenOffice_Integration.oxt is 5.0.23.
```

```

/Downloads$ dpkg -l | grep libreoffice
ii liblibreoffice-java 1:7.0.0~rc2-1 all LibreOffice UNO runtime environment -- Java library
ii libobasis7.0-libreofficekit-data 7.0.0.3-3 amd64 Libreofficekit data files for LibreOffice 7.0 .0.3
ii libreoffice7.0 7.0.0.3-3 amd64 Brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-base 7.0.0.3-3 amd64 Base brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-calc 7.0.0.3-3 amd64 Calc brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-debian-menus 7.0.0-3 all LibreOffice 7.0 desktop integration
ii libreoffice7.0-dict-en 7.0.0.3-3 amd64 En dictionary for LibreOffice 7.0 .0.3
ii libreoffice7.0-dict-es 7.0.0.3-3 amd64 Es dictionary for LibreOffice 7.0 .0.3
ii libreoffice7.0-dict-fr 7.0.0.3-3 amd64 Fr dictionary for LibreOffice 7.0 .0.3
ii libreoffice7.0-draw 7.0.0.3-3 amd64 Draw brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-en-us 7.0.0.3-3 amd64 Brand language module for LibreOffice 7.0 .0.3
ii libreoffice7.0-impress 7.0.0.3-3 amd64 Impress brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-math 7.0.0.3-3 amd64 Math brand module for LibreOffice 7.0 .0.3
ii libreoffice7.0-ure 7.0.0.3-3 amd64 UNO Runtime Environment .0.3
ii libreoffice7.0-writer 7.0.0.3-3 amd64 Writer brand module for LibreOffice 7.0 .0.3
```

The LibreOffice integration client that I am using is _dirty_, in the sense that I have some (experimentals) changes over and top of the official releases. I can reproduce this issue at will.

(They were all downloaded within the last week)

kjambunathan · August 5, 2020

If Zotero support folks consider this a bug, they can leave a review note at https://github.com/zotero/zotero/issues/1866.

adomasven · August 5, 2020

This is by design. We do not want users to begin new Zotero integration transactions in parallel on the same document, and most of the time when we receive a plugin request from a Word processor while an integration transaction is already in motion, it is because a user had initiated a citation edit, then forgot about it, in which case we try to focus the citation dialog (although not always successfully due to OS differences and limitations). It is true that Zotero won't accept any new integration actions if an integration client does not properly finalize the transaction, in LOs case by failing to answer on any of the TCP requests, or in a more general case, if the integration client crashes or similar. We find this a reasonable compromise. The integration functionality in that case is restored by restarting Zotero.

I'm not sure why you went out of your way to investigate this with wireshark. Both Zotero and LibreOffice plugins are open-source, you can just read the code and see what they do. Moreover, it is not the TCP socket that locks up. Zotero receives messages on the TCP socket fine and simply ignores them in Zotero.Integration logic. No resetting of TCP sockets will help since that's not where the action is being prevented.

I will take this issue separately with Zotero LibreOffice team

I am the primary/sole developer of the Zotero LibreOffice plugin, as well as the primary person working on the integration code.

If you want to have a technical discussion about Zotero development feel free to post on the zotero-dev mailing list. Otherwise the forums are the place for potential bug reports, discussion over features, etc.

kjambunathan · August 5, 2020

> This is by design. We do not want users to begin new Zotero integration transactions in parallel on the same document, and most of the time when we receive a plugin request from a Word processor while an integration transaction is already in motion, it is because a user had initiated a citation edit, then forgot about it, in which case we try to focus the citation dialog (although not always successfully due to OS differences and limitations).

_Not recovering_ is really a bug. In my current case, the server hasn't received a documentID so how can it even assume that there are some parallel operations going on a document (which it hasn't even heard about). The server shouldn't be so opinionated!

> This is by design.

You (and the Zotero team) may want to review the design.

Server shouldn't make _blind_ assumptions about the state, rather it should make an effort to identify the _true state_ and take appropriate decisions. A good server doesn't deny service to a new (good?) client, just because it saw rogue client a few days before.

(If you have a public Zotero server, the scenario that I outline in this post can easily be exploited to do a DoS attack on it.)

> I will take this issue separately with Zotero LibreOffice team

>> I am the primary/sole developer of the Zotero LibreOffice plugin, as well as the primary person working on the integration code.

This makes things easy for me. I will provide a proper bug report on github page. (Typing out the details will take time)

To give you a heads-up,

I want to run `Zotero.Zotero.ZoteroRefresh' on a live document using the command line like so

```
/opt/libreoffice7.0/program/soffice --norestore --invisible --headless 'macro:///Zotero.Zotero.MyZoteroRefresh(file:///home/kjambunathan/src/zotero-better-bibtex/pandoc/test-regular.odt)'
```

The problem is ... `ZoteroRefresh` is asynchronous. When the macro call returns _most likely_ the Zotero refresh isn't over yet i.e., a `Document_complete' hasn't be seen yet.

I want the above refresh to be synchroounous i.e. when the CLI finishes the document should have been updated (and also _saved_). The requirement for 'synchronous call' is essential for command line driven document production.

So, I have introduced a `ZoteroRefreshAndWait' ....

I achieve this by forcing the thread that queues the `refresh` `CommCommand` _wait_ --specifically, do a thread join--for the server thread to finish. (I also make the `serverThread` return once the `CommMessage` for document complete is processed.)

My changes work well when I do a ZoteroRefresh by hand--i..e, the normal way.

However ... when I do it via CLI macro call, for some reason the server thread vanishes--the documentID gets generated but doesn't get written to the output stream--The server thread locking up is why you don't see the documentID on the wire.

The only difference between "by hand" refresh and the CLI refresh is that the soffice is headless and frame loading the target document is "hidden" i.e., I do a

```
StarDesktop.loadComponentFromURL(inFileURL, "_blank", 0, Array())

```

My gut feeling is that when the GUI frames are hidden, the "activation context" for unoservice misses an essential piece, that makes the whole situation a bit confused and broken.

I have to share my experimental code with you (in order for the previous paragraphs to make any sense to you). So, I will open a PR--so that you can take a look at my experimental changes--and offer your suggesions for improvement.

Typing up the details is going to take time ...

I will post a issue on github as well as the forum. (I see no provision for attaching a file in to the forum posts, and even if there is such a provision. the experimental patch has to be seen _within_ the context of repo's HEAD) So, I will take some liberty and post an PR-cum-issue on the github page. I hope it is OK with you.

kjambunathan · August 5, 2020

> https://groups.google.com/forum/#!forum/zotero-dev

Thanks for this link. I didn't realize the project had a separate forum for developers ...

adomasven · August 5, 2020

_Not recovering_ is really a bug. In my current case, the server hasn't received a documentID so how can it even assume that there some parallel operations going on a document (which it doesn't about)

As soon as Zotero receives a request from a doc processor it initiates an Integration transaction. Sure, we could wait for a response to getDocument calls, but that does not make it a bug. The integration server has fail states if the integration client interacts with it incorrectly, but that's up to the integration client to not mess up.

You may want to look review the design. Server shouldn't make _blind_ assumptions about the state, rather it should make an effort to identify the _true state_

Once again, this is for the designers of the software to decide. We do not have to abide by any "standards" for the sake of standards, we design things based on our needs and use case. This is by design, because:
1. There are meaningful usability reasons to not allow simultaneous integration transactions.
2. We could perhaps delay entering the integration transaction, but that would require additional code complexity for a difference which has never been relevant, except for the issue you're having now.

[it should] take appropriate decisions which will ensure that it doesn't deny service to a good client, just because it saw rogue client a few days before. (If you have a public Zotero server the scenario that I outline in this post can easily be exploited to do a DoS attack on it.)

The Zotero integration server is only exposed on the local machine. If we were exposing this online we would definitely design it with such things in mind. If a user chooses to run rogue integration clients on their machines or has malware that exploits this, then its up to the user to not do that.

My changes work well when I do a ZoteroRefresh by hand--i..e, the normal way. However when I do it via macro, for some reason the server thread vanishes--the documentID gets generated but doesn't get written to the output stream--This thread locking up is why you don't see the documentID on the wire.

My gut feeling is that when the GUI frames are hidden, the "activation context" for unoservice misses an essential piece, that makes the whole situation a bit confused.

This seems like an issue related to headless LibreOffice specifically, not Zotero. I assume it's annoying for you to have to restart Zotero every time it locks up due to this, but we will not be fixing it, since this is not a breaking bug for any standard Zotero use cases, or standard plugin interactions that follow the integration protocol correctly.

It certainly seems like this discussion is more suited for our dev mailing list. Please create a thread there about it and we'll try to give you technical help (although you might need to seek it at Document Foundation too). Also please understand that we have limited time and resources to spend on esoteric Zotero use cases and if the problem/new code is overly-complex we might not be able to suggest a solution.

kjambunathan · August 6, 2020

For some reasons, I wasn't feeling comfortable with the manner in which you have responded. I will let this issue pass.