Altmetrics Grant

adamsmith · April 21, 2014

First of all, congratulations to the altmetrics grant:
https://www.zotero.org/blog/funding-for-altmetrics-research-and-expanded-api/
this has been much anticipated.
A couple of questions:
1. Privacy/data ownership. So I realize data is going to be anonymized, but still, Zotero's privacy policy, which to me is one of its core strengths, is to "not share your data with third parties." So the question is how you'll be doing this exactly. The two obvious choices I see is to just aggregate data from public libraries or to offer an opt-in. I'd be happy with either option, but I'd feel queasy about you using all user data without explicit consent.

2. To what extent do you think the new API will be useful in handling retrieve metadata requests?

dstillman · April 21, 2014

1. Privacy/data ownership. So I realize data is going to be anonymized, but still, Zotero's privacy policy, which to me is one of its core strengths, is to "not share your data with third parties."

We've given this a lot of thought over the years, and ultimately we don't consider sharing these sorts of aggregate statistics to constitute sharing private data. We don't feel there's an articulable privacy concern with these counts — and certainly not one that justifies the vastly less useful data that would result from limiting the counts to public libraries, which represent a tiny fraction of total Zotero libraries.

The page you quote also discusses the use of "anonymous and aggregated" data for "research and analysis", which, though it doesn't explicitly mention sharing, suggests this sort of usage.

While we therefore see this usage as covered by our existing policies, we're going to be rolling out updated terms of service and an updated privacy policy this week for a variety of reasons (mainly to consolidate all Zotero software and services into a single document), and the updated privacy policy makes this new usage more explicit. (We'll be notifying people of the changes via email, since our current ToS say we will.)

To be clear, though, this aggregate data isn't yet available, and if anyone is concerned with this usage, they can of course delete their data from the Zotero servers.

2. To what extent do you think the new API will be useful in handling retrieve metadata requests?

It's separate, at least at this point. Currently for this grant we're only talking about providing readership counts for given identifiers (DOI, ISBN, etc.). A lookup service for metadata retrieval would be a separate API that took full-text queries and returned identifiers.

adamsmith · April 21, 2014

thanks, that's helpful.
It would still be good if it were possible to at least implement an opt-out for users that would allow them to use the Zotero servers without having their data shared - although it also depends how much exactly you're going to make available. Item counts per identifier do indeed seem entirely unproblematic. The closer released datasets resemble full, anonymized contents of a user's library the more problematic this is.

As you know, these issues are viewed a lot more critically in Europe than on this side of the pond, so treating as lightly as possible would be good. In Germany, e.g., there were even some complaints against not requiring explicit opt-in for new users for full-text sync (http://infobib.de/blog/2013/11/19/zotero-synchronisiert-texte/ ).

dstillman · April 21, 2014

Item counts per identifier do indeed seem entirely unproblematic.

OK, I'm glad you agree on this — and that is indeed all we're talking about here.

We would not expose metadata in private libraries, anonymized or not. If, for example, metadata were provided along with these readership counts, it would come solely from public sources, even if the aggregate readership counts themselves included items in private libraries.

aurimas · April 21, 2014

The Zotero project’s first phase of participation will involve the aggregation and delivery of anonymized datasets to allow our research partners in Montreal and Bloomington to compare readership across a range of metrics, including commercial databases, social media, and reference management software.

It seems that for this initial dataset, one would need to deliver more information than just readership stats. Is this what you're referring to when you say

If, for example, metadata were provided along with these readership counts, it would come solely from public sources

dstillman · April 22, 2014

No, I was referring to the metadata of future "global items" made available from the Zotero API and zotero.org — that is, metadata accompanying these readership counts. I'm saying that, while the readership counts may include items in private libraries, any metadata displayed would come from either external sources, resolved using those items' UUIDs (ISBNs, DOIs, etc.), or, for items without UUIDs, matched items in public Zotero libraries. So metadata on private items will never be exposed.

The datasets described in the grant announcement will follow these same restrictions — we're not providing anything that won't eventually be available publicly via the Zotero API. (Part of the purpose of the grant is just to improve the accuracy of that data before it's released.) We'll essentially be providing identifier:count:public-user-items, where 'identifier' is a UUID (or, where there isn't one, a Zotero global item id), 'count' is a readership count (across all Zotero libraries), and 'public-user-items' is the set of constituent user/group items in public libraries. No data on private items will be included beyond the readership counts.

We're aware that "anonymized" is a bit of a scary word, particularly in tech circles. In this context we really just mean "aggregate", and anonymous simply to the extent that readership counts would be inherently anonymous.

benhockenberry · November 25, 2014

Has there been motion forward on Zotero-based altmetrics since the April announcement?

adamsmith · November 25, 2014

they're working on it but nothing that's been made public, no.

wouterhendrickx · June 2, 2015

Any news on this feature, I would love to see these stats appear next to mendeley's in altmetrics.

mark · March 22, 2016

Any news on this? Like wouterhendrickx above, I would love to see Zotero represented in altmetrics. (I've seen this: https://www.zotero.org/blog/studying-the-altmetrics-of-zotero-data/ — but haven't seen anything since either here or on the altmetrics/impactstory side.)

mark · October 21, 2016

Just want to bump this another time because I'm really curious to hear more. I'd love it if aggregated data from a widely used open source, not-for-profit project like Zotero could make a tangible contribution to altmetrics, especially if it benefits ImpactStory, which recently has made a commendable turn towards fully open data.

adamsmith · October 21, 2016

which recently has made a commendable turn towards fully open data.

and is now free again for researchers to use.