Altmetrics Grant
First of all, congratulations to the altmetrics grant:
https://www.zotero.org/blog/funding-for-altmetrics-research-and-expanded-api/
this has been much anticipated.
A couple of questions:
1. Privacy/data ownership. So I realize data is going to be anonymized, but still, Zotero's privacy policy, which to me is one of its core strengths, is to "not share your data with third parties." So the question is how you'll be doing this exactly. The two obvious choices I see is to just aggregate data from public libraries or to offer an opt-in. I'd be happy with either option, but I'd feel queasy about you using all user data without explicit consent.
2. To what extent do you think the new API will be useful in handling retrieve metadata requests?
https://www.zotero.org/blog/funding-for-altmetrics-research-and-expanded-api/
this has been much anticipated.
A couple of questions:
1. Privacy/data ownership. So I realize data is going to be anonymized, but still, Zotero's privacy policy, which to me is one of its core strengths, is to "not share your data with third parties." So the question is how you'll be doing this exactly. The two obvious choices I see is to just aggregate data from public libraries or to offer an opt-in. I'd be happy with either option, but I'd feel queasy about you using all user data without explicit consent.
2. To what extent do you think the new API will be useful in handling retrieve metadata requests?
The page you quote also discusses the use of "anonymous and aggregated" data for "research and analysis", which, though it doesn't explicitly mention sharing, suggests this sort of usage.
While we therefore see this usage as covered by our existing policies, we're going to be rolling out updated terms of service and an updated privacy policy this week for a variety of reasons (mainly to consolidate all Zotero software and services into a single document), and the updated privacy policy makes this new usage more explicit. (We'll be notifying people of the changes via email, since our current ToS say we will.)
To be clear, though, this aggregate data isn't yet available, and if anyone is concerned with this usage, they can of course delete their data from the Zotero servers. It's separate, at least at this point. Currently for this grant we're only talking about providing readership counts for given identifiers (DOI, ISBN, etc.). A lookup service for metadata retrieval would be a separate API that took full-text queries and returned identifiers.
It would still be good if it were possible to at least implement an opt-out for users that would allow them to use the Zotero servers without having their data shared - although it also depends how much exactly you're going to make available. Item counts per identifier do indeed seem entirely unproblematic. The closer released datasets resemble full, anonymized contents of a user's library the more problematic this is.
As you know, these issues are viewed a lot more critically in Europe than on this side of the pond, so treating as lightly as possible would be good. In Germany, e.g., there were even some complaints against not requiring explicit opt-in for new users for full-text sync (http://infobib.de/blog/2013/11/19/zotero-synchronisiert-texte/ ).
We would not expose metadata in private libraries, anonymized or not. If, for example, metadata were provided along with these readership counts, it would come solely from public sources, even if the aggregate readership counts themselves included items in private libraries.
The datasets described in the grant announcement will follow these same restrictions — we're not providing anything that won't eventually be available publicly via the Zotero API. (Part of the purpose of the grant is just to improve the accuracy of that data before it's released.) We'll essentially be providing identifier:count:public-user-items, where 'identifier' is a UUID (or, where there isn't one, a Zotero global item id), 'count' is a readership count (across all Zotero libraries), and 'public-user-items' is the set of constituent user/group items in public libraries. No data on private items will be included beyond the readership counts.
We're aware that "anonymized" is a bit of a scary word, particularly in tech circles. In this context we really just mean "aggregate", and anonymous simply to the extent that readership counts would be inherently anonymous.