The Zotero Library dataset

jason · November 24, 2009

I'm an enthusiastic user, evangelist, and sometime hacker of Zotoro. I'm also a doc student in information science, and I'm interested in looking at Zotero as part of my current research.

I'm particularly interested in using data from the social web to better measure the impact of scholarly writing. This is an area that people are starting to get excited about (a great article was just published in PLoS Biology last week), and I think it's going to really be significant over the next few years.

Two of the biggest players in this area, CiteULike and Connotea, are facilitating this kind of investigation by making their user-article-tag data available to researchers, either as an API (Connotea) or as a big 'ol download (CiteULike). So far as I can see, though, Zotero hasn't done this for public libraries. Of course, you could scrape them, but that's a pain.

Zotero has really set itself apart my keeping it's code free and open. I think it would be great to be the same way with the library data. So, my questions:
1) Is there any way to get a dump of the library data, in whatever format?
2) If not, is there any way for me to encourage Zotero folks to make that happen?

dstillman · November 24, 2009

Open data has been a core principle of Zotero from the beginning.

There's a REST API, which is used, among other things, for the feeds you see on public library pages. The API isn't yet complete and documentation hasn't been published, but we'll be providing documentation in the near future.

jason · November 24, 2009

That's great to hear, Dan. Like I said, I've always had tremendous respect for Zotero's open approach.

Do you have an estimate of when we might see a working API? I ask because I want to know if I should start working on scraping or not; I'm hoping to have data I can use in January or so, for a article I'll be submitting.

Along those lines, are you aware of any other scholars working with Zotero data this way? I've not seen anyone using actual Zotero data (as opposed to just suggesting its use) in the literature I'm familiar with, but maybe you'd know better. It would be great to compare notes if I can find someone who's done this sort of thing already.

jason · December 5, 2009

So no estimate on what "in the near future" means? I guess I should start scraping, then?

sean · December 6, 2009

Although not all planned methods of the API are fully implemented, the API has been open and available for several months. The main remaining task is documentation. I'll try to get something posted within the next couple of days, but in the interim, you can certainly start poking around with things like:


/users/<userID>/items

/users/<userID>/items/<itemID>

/users/<userID>/collections

/users/<userID>/collections/<collectionID>

Please note that userID ≠ username. Your userID, for example, is 3460.

etc.

The base URL for requests is

https://api.zotero.org/

jason · February 12, 2010

Sean, I wonder if you could give me an update on the status of the REST api? I'm still getting "this is not available" when I use the requests you listed. Is this still something that's in the works?

sean · February 12, 2010

There's some preliminary documentation available thanks to Raymond Yee.

jason · February 14, 2010

Thanks, Sean. That's useful, as is Jeremy Boggs' php library that Raymond mentions.

However, what I'm interested in is not so much an API for interacting with users as a structured way of getting information about the contents of users' publicly-available libraries--something along the lines of CiteULike's database dump or delicious' search feeds. Mendeley recently published a list of the ten articles most stored in users' libraries; I think it would be great to do these sorts of things with Zotero libraries, as well.

My goal is to publish this as part of my research into alternative metrics of scholarly impact. Of course, one could scrape all this public data off the website, but I'd like to avoid that brittle and time-consuming approach if at all possible.

I've been corresponding with folks from a number of these bookmarking/reference/recommendation type services over email, and so far everyone's been great about scaring up either an api or database dump. I understand Zotero is perhaps a smaller operation, so may not have the personnel to deal with these kind of requests. If there's any way I can help out, I'm more than happy to.