The Zotero Library dataset
I'm an enthusiastic user, evangelist, and sometime hacker of Zotoro. I'm also a doc student in information science, and I'm interested in looking at Zotero as part of my current research.
I'm particularly interested in using data from the social web to better measure the impact of scholarly writing. This is an area that people are starting to get excited about (a great article was just published in PLoS Biology last week), and I think it's going to really be significant over the next few years.
Two of the biggest players in this area, CiteULike and Connotea, are facilitating this kind of investigation by making their user-article-tag data available to researchers, either as an API (Connotea) or as a big 'ol download (CiteULike). So far as I can see, though, Zotero hasn't done this for public libraries. Of course, you could scrape them, but that's a pain.
Zotero has really set itself apart my keeping it's code free and open. I think it would be great to be the same way with the library data. So, my questions:
1) Is there any way to get a dump of the library data, in whatever format?
2) If not, is there any way for me to encourage Zotero folks to make that happen?
I'm particularly interested in using data from the social web to better measure the impact of scholarly writing. This is an area that people are starting to get excited about (a great article was just published in PLoS Biology last week), and I think it's going to really be significant over the next few years.
Two of the biggest players in this area, CiteULike and Connotea, are facilitating this kind of investigation by making their user-article-tag data available to researchers, either as an API (Connotea) or as a big 'ol download (CiteULike). So far as I can see, though, Zotero hasn't done this for public libraries. Of course, you could scrape them, but that's a pain.
Zotero has really set itself apart my keeping it's code free and open. I think it would be great to be the same way with the library data. So, my questions:
1) Is there any way to get a dump of the library data, in whatever format?
2) If not, is there any way for me to encourage Zotero folks to make that happen?
There's a REST API, which is used, among other things, for the feeds you see on public library pages. The API isn't yet complete and documentation hasn't been published, but we'll be providing documentation in the near future.
Do you have an estimate of when we might see a working API? I ask because I want to know if I should start working on scraping or not; I'm hoping to have data I can use in January or so, for a article I'll be submitting.
Along those lines, are you aware of any other scholars working with Zotero data this way? I've not seen anyone using actual Zotero data (as opposed to just suggesting its use) in the literature I'm familiar with, but maybe you'd know better. It would be great to compare notes if I can find someone who's done this sort of thing already.
/users/<userID>/items
/users/<userID>/items/<itemID>
/users/<userID>/collections
/users/<userID>/collections/<collectionID>
Please note that userID ≠ username. Your userID, for example, is 3460.
etc.
The base URL for requests is
https://api.zotero.org/
However, what I'm interested in is not so much an API for interacting with users as a structured way of getting information about the contents of users' publicly-available libraries--something along the lines of CiteULike's database dump or delicious' search feeds. Mendeley recently published a list of the ten articles most stored in users' libraries; I think it would be great to do these sorts of things with Zotero libraries, as well.
My goal is to publish this as part of my research into alternative metrics of scholarly impact. Of course, one could scrape all this public data off the website, but I'd like to avoid that brittle and time-consuming approach if at all possible.
I've been corresponding with folks from a number of these bookmarking/reference/recommendation type services over email, and so far everyone's been great about scaring up either an api or database dump. I understand Zotero is perhaps a smaller operation, so may not have the personnel to deal with these kind of requests. If there's any way I can help out, I'm more than happy to.