Integrating Zotero into the Semantic Web

Hi to the Zotero team,


A couple of days ago Micheal Bergmen of the Linked-Open-Data community mailing list talked about Zotero and asked how it could be use into the semantic web. Then I wrote an article describing how Zotero could be integrated into a semantic web environment:

http://fgiasson.com/blog/index.php/2007/04/12/integration-of-zotero-in-a-semantic-web-environment-to-find-search-and-browse-the-webs-citations/


I would invite you to read this article and tell me if such an initiative could interest you. In fact, many people are quite interested in the possibilities that such an integration could create.

Please tell me what you think about this idea and if you would be interested in participating in it.

Take care,


Salutations,


Frederick Giasson
President - Zitgist LLC.

http://fgiasson.com/blog/
http://pingthesemanticweb.com/
  • I blogged about related issues awhile back. The key is really the URI infrastruture, because beyond search and such, it can be important in document citations as well.
  • Hi bdarcus,

    Well, what you talk about in your blog post are tags and their consistency. Since it is not really what interest me, it is not a big issue here. I have no faith in unsupervized tags things (there is no semantic and consistency possible between tags from different systems). So it is totally another problem.

    It is sure that URI of citations could an issue, but I doubt that it would be a big problem. We certainly could create a good schema without too much problem.

    Anyway, the URI will only be important when people will start to use Zitgist as a citation provider. From there, when they will refers to citations available on Zitgist, they will use these URIs and then the consistency should appears.

    Salutations,


    Fred
  • Well, what you talk about in your blog post are tags and their consistency. Since it is not really what interest me, it is not a big issue here. I have no faith in unsupervized tags things (there is no semantic and consistency possible between tags from different systems). So it is totally another problem.
    Yes, I was in a hurry, but just wanted to agree with the notion that the semweb potential of Zotero is large.
    It is sure that URI of citations could an issue, but I doubt that it would be a big problem. We certainly could create a good schema without too much problem.
    I just wanted to emphasize that the issues that Zotero needs to deal with go beyond search and sharing. It ties into document production itself, note-taking, etc.

    I'm on the OpenDocument TC and metadata subcommittee, for example, where we're just about done with a proposal to add RDF support to ODF. So citations in ODF will almost certainly be identified in ODF using URIs and encoded using RDF.

    By contrast, currently the Zotero Word plug-in identifies citations by local database ID.

    In an ideal world, you just add the URIs to your document and some service can grab the metadata and format the citations. Or, you copy-and-paste text from a web document into your ODF document window and the citations automatically renders.

    This is definitely off your topic., but I just wanted to give a sense of how this could all tie together in ways that have big benefits for users.

    BTW, it's not that easy to design a schema that works well for different domains (physical sciences, humanities, law, etc.). It's a common fallacy that people think that citations are simple ;-)

    But I do have a first draft of one here, and have more recently been working revising it. The Zotero guys are using pieces of it, though it definitely needs improvement.
    Anyway, the URI will only be important when people will start to use Zitgist as a citation provider. From there, when they will refers to citations available on Zitgist, they will use these URIs and then the consistency should appears.
    You do know that Zotero is planing server functionality, right? I actually prefer the notion of a more decentralized semantic web based approach, but just FYI ...
  • Hi again,
    I just wanted to emphasize that the issues that Zotero needs to deal with go beyond search and sharing. It ties into document production itself, note-taking, etc.
    Yeah, certainly not :)
    In an ideal world, you just add the URIs to your document and some service can grab the metadata and format the citations. Or, you copy-and-paste text from a web document into your ODF document window and the citations automatically renders.
    Well yeah, like with the semweb clipboard: letting RDF metadata following copy-and-paste information of documents, etc.
    This is definitely off your topic., but I just wanted to give a sense of how this could all tie together in ways that have big benefits for users.
    Not that off topic. In the idea world, it would be fantastic, however we do not live in such a world at the moment... unfortunately :) The benifits to users would be fantastic: from portfolio aggregations, to document sharing, searching, automatic generation of bibliographic references, etc.

    However, I think that we have to start somewhere, and the project I am proposing is that one, for the best or the worse.
    But I do have a first draft of one here, and have more recently been working revising it. The Zotero guys are using pieces of it, though it definitely needs improvement.
    It seems quite interesting, I will definitely take a deeper look at it in the next days.

    Did you take a look at the music ontology I am developing with Yves Riamond and other people? (musicontology.com). About 40 people aggregated around this ontology to form an interesting development community. Why not doing the same for such an bibl ontology? Contact me if you are interested, it could be a good starting point for the project.
    You do know that Zotero is planing server functionality, right? I actually prefer the notion of a more decentralized semantic web based approach, but just FYI ...

    No I didn't; I have been introduced to Zotero yesterday, so I don't know everything involving it :)

    Anyway, it is not a big problem. I have no idea what they are planning to do with it, however is they do the same thing as Musicbrainz and distribute their database using an open licence then this is not a problem. We would only have to grab an instance of the DB, describe it in RDF using some ontologies, and distribute the RDF instance of the Zotero database, freely as well, to anyone who would want it. It could be part of the Linked-Open-Data initiative: http://simile.mit.edu/mail/SummarizeList?listId=14

    This is not hard to do: need the data (zotero database), one or more ontologies to describe its data, an converting it into RDF. After that, the data can life in any system. It has been done with musicbrainz, and it can be done by zotero.

    In such a case, instead of making zotero add-on instance pinging PTSW (pingthesemanticweb.com), it could be better if the Zotero Data Server would ping it each time a new citation is discovered/uptated. This method also work quite fine and is already used by systems like geonames.org (a first version of the DB has been updated on PTSW, and then the updates are sent by pinging to the system).

    anyway, many things can be done on the semantic web, many methods can be used, it is just another example of its potential.


    take care,


    Fred
  • Fred,

    Thanks so much for your thoughtful, interesting post on how Zotero might fit into and advance the semantic web. I'm all in favor of exploring how we might enable this; after all, one of the core missions of the project is to free up and exchange bibliographic information.

    I think it would be rather trivial to write a Zotero utility that pings PTSW every time a Zotero user with this utility installed saves a reference. (Your note #2 about pinging on each detection might be a little more difficult and produce serious overhead.) To round-trip the info back into individual Zotero collections, we could write a translator for grabbing RDF from your system. So yes, this is all very plausible and an exciting idea.

    As Bruce notes, since we focus on more than just citation info (e.g., notes), there might be some issues, and we would also need to think about how pings to our server would work. But anyway, just wanted to post a quick note of thanks and willingness to work with you on this.

    Dan
  • Dan,

    Excellent; thanks for your response to Fred. I think you will find Fred's interest and others quite active in the Linking Open Data group (http://dbpedia.org and http://www.openlinksw.com) will be a massive value add to Zotero with few demands.

    OpenLink and Fred (via Zitgist) are helping to nucleate a rapid expansion of Web content structuring into RDF and its search and interoperability. I think Zotero and its citation strengths should be one of the natural pillars for such sources.

    I'm also seeing some natural synergies with the suite of Zotero "translators" with the Simile Solvent (I know, a godfather of the Zotero approach), new Sifter innovations at MIT, and the really cool Triplr stuff from Dan Connolly and the Sponger stuff just announced by OpenLink and used by Fred.

    At any rate; I didn't mean to ramble, but I encourage you to continue to explore what I see to be real natural synergies with these groups.

    BTW, the new Zotero beta is fantastic (love Wikipedia and the annotator). I'm doing another write-up/update in the near future.

    Mike
  • Hi Dan and Micheal,

    First of all, thanks for manifesting your interest into this initiative.
    I think it would be rather trivial to write a Zotero utility that pings PTSW every time a Zotero user with this utility installed saves a reference.
    Yes it would.
    there might be some issues, and we would also need to think about how pings to our server would work.
    Here you are talking about your server pinging PTSW or the inverse?


    Each way could work (pinging from Zotero users or Zotero server), and each way would be equally simple I think. However, I have no idea of what you are planning to do with Zotero servers, so more information about that could be appreciate for a better assessment of the situation.


    Now, I think that bdarcus was right, and the first step to take would be to create/extend an ontology that Zotero could fully integrate for the RDF exportation feature.

    Such a problem was present when I decided to convert musicbrainz.org (and other sources of music data such as Jmendo, Magnatune, etc). So I started to develop the Music Ontology. Since then, about 40 people joint the community and even Yves Raimond strated to orient its Doctoral degree toward that ontology and to write conference papers about it.

    The logical starting point would be to create/update that bibliographic reference ontology, to create a community that would work on its development, uses-cases, examples, integration, use, etc.

    Once the data structure is agreed and ready for a first try, we could think about enabling the feature into Zotero (it could be developed in the mean time without too much problem).


    In fact, what interest me in all that stuff (Zotero, document descriptions, etc) is when I developed that section of the Zitgist Semantic Web Search Engine. In fact, some basic dataset will be integrated into it for its first public release: Music, People, Geographical location, Projects and Document.

    However, one of the problem I faced with Documents is the variety of ways people talk about documents: foaf:Document, DC, sioc:Post, etc, etc, etc.

    I think that this effort with Zotero would have a much broader effect in the entire Semantic Web, and possibly the scientific, entertainment, etc. Worlds.


    If people are willing to put some time in that project, I could certainly start to develop the infrastructure to support its community development (like what I am doing with the Music Ontology effort).


    thanks,


    Take care,


    Fred
Sign In or Register to comment.