Importing my Google Reader starred items (Atom format)

brianmingus · January 8, 2008

I've got roughly 6000 of my starred items from Google Reader that I would like to try importing into Zotero. I will described what I have tried so far. Here's an example of their ATOM format.


  <link rel="alternate" href="http://www.boingboing.net/" type="text/html">
  <entry gr:is-read-state-locked="true" gr:crawl-timestamp-msec="1183513071338">
   <id gr:original-id="">
    tag:google.com,2005:reader/item/736719fd96caee41
   </id>
   <category term="user/14187970455121264404/state/com.google/broadcast" scheme="http://www.google.com/reader/" label="broadcast">
   </category>
   <category term="user/14187970455121264404/state/com.google/read" scheme="http://www.google.com/reader/" label="read">
   </category>
   <category term="user/14187970455121264404/state/com.google/starred" scheme="http://www.google.com/reader/" label="starred">
    <title type="html">
     Caffeine may not give a jolt to health - The vaunted benefits of coffee and tea may be in spite of the stimulant rather than because of it.
    </title>
    <published>
     2007-07-02T09:52:03Z
    </published>
    <updated>
     2007-07-02T09:52:03Z
    </updated>
   </category>
  </entry>
 </link>
 <link rel="alternate" href="http://science.reddit.com/goto?rss=true&amp;id=22udf" type="text/html">
  <summary xml:base="http://reddit.com/" type="html">
   &lt;a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;amp;ctrack=1&amp;amp;cset=true"&gt;[link]&lt;/a&gt;&lt;a href="http://science.reddit.com/info/22u\
df/comments"&gt;[more]&lt;/a&gt;
  </summary>
  <author gr:unknown-author="true">
   <name>
    (author unknown)
   </name>
  </author>
  <source gr:stream-id="feed/http://science.reddit.com/.rss">
   <id>
    tag:google.com,2005:reader/feed/http://science.reddit.com/.rss
   </id>
   <title type="html">
    science: what&amp;#39;s new online
   </title>
  </source>
 </link>

I first tried using triplr.com for a conversion to RDF, but this file is simply too large at 12MB. I next downloaded Raptor and ran the rapper command to convert the Atom to RDF, but it doesn't support Atom 1.0, so I next used sabcmd to apply this Atom->Rss XSLT stylesheet, and then used rapper to convert the RSS to RDF. It converted the above into the following:


<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <rdf:type rdf:resource="http://purl.org/rss/1.0/item"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:title xmlns:ns0="http://purl.org/rss/1.0/">Caffeine may not give a jolt to health - The vaunted benefits of coffee and tea may be in spite of the stimulant rather than because of it.</ns0:title>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:link xmlns:ns0="http://purl.org/rss/1.0/">http://science.reddit.com/goto?rss=true&amp;id=22udf</ns0:link>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:description xmlns:ns0="http://purl.org/rss/1.0/">&lt;a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;amp;ctrack=1&amp;amp;cset=true"&gt;[link]&\
lt;/a&gt;&lt;a href="http://science.reddit.com/info/22udf/comments"&gt;[more]&lt;/a&gt;</ns0:description>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:pubDate xmlns:ns0="http://purl.org/rss/1.0/modules/rss091#">2007-07-02T09:52:03Z</ns0:pubDate>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:date xmlns:ns0="http://purl.org/dc/elements/1.1/">2007-07-02T09:52:03Z</ns0:date>
  </rdf:Description>
  <rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&amp;id=22udf">
    <ns0:encoded xmlns:ns0="http://purl.org/rss/1.0/modules/content/">&lt;a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;amp;ctrack=1&amp;amp;cset=true\
"&gt;[link]&lt;/a&gt;&lt;a href="http://science.reddit.com/info/22udf/comments"&gt;[more]&lt;/a&gt;</ns0:encoded>
  </rdf:Description>
  <rdf:Description rdf:nodeID="genid2">
    <rdf:_4 rdf:resource="http://science.reddit.com/goto?rss=true&amp;id=22udf"/>
  </rdf:Description>
</rdf:RDF>

Zotero doesn't give an error on the generated file (which is quite large - 44,000 triples), but also doesn't import anything. Ideas? Should I resort to a manual conversion (e.g., Python) straight away?

dstillman · January 9, 2008

Zotero can't import arbitrary RDF files—the data needs to be in an expected format, which we call Zotero RDF.

If you're comfortable with JavaScript, the best thing to do would probably be to write an import translator for Atom, since you could then import all the other available metadata in the Atom file. You can look at the MODS or RIS translators for examples of how to write an import translator (ignoring the export parts). You can post a note on the dev list if you have questions.

We'd be happy to include such a translator in Zotero itself. If we did so, a Google Reader translator could simply pull the Atom file and pass it to the other translator.