Is Zotero guarantying data privacy?

Hi,

I was wondering whether Zotero is (and will remain) off limit to web crawlers used by Generative AI (such as ChatGPT)?
The reason I ask is because Google's policy in this regard are softening every day and I expect the day when private documents are somehow integrated to their training dataset to be not so far away in the future...
  • https://www.zotero.org/support/privacy

    If you make your library public, it can be crawled (though I wonder for what use) -- not much Zotero can do about that.

    I think you can pretty safely assume that Zotero won't allow any 3rd party AI access to non-public data (and it definitely doesn't now).
  • Hi, @adamsmith, I was searching the forums for this topic and came across this thread.

    I have to imagine Zotero/Digital Scholar has been approached by corporations looking for data to train LLMs on.

    Has Digital Scholar/Zotero put out a statement about this? I personally would like to know that Zotero would never allow user data to be used to train LLMs. (I realize that public libraries would be crawled, as you note above. I'm concerned that things like my personal library and notes would be made accessible to genAI corporations).
  • I actually doubt Zotero has been approached about that, though I wouldn't know. Metadata for academic works is mostly in the public domain. Full text of academic works can't be legally obtained from Zotero and if you want it illegally, libgen et al are better sources, which leaves you with notes, which are a tiny amount of text and not well structured -- this just isn't high value training data. Maybe the composition of libraries would be of interest for recommendation engines, but I'm skeptical.

    Anyway, you can read the first two paragraphs in the linked privacy policy -- Zotero has been consistent about having no interest in selling user data for 20 years.
  • Excellent, thank you!
Sign In or Register to comment.