Importing my Google Reader starred items (Atom format)
I've got roughly 6000 of my starred items from Google Reader that I would like to try importing into Zotero. I will described what I have tried so far. Here's an example of their ATOM format.
I first tried using triplr.com for a conversion to RDF, but this file is simply too large at 12MB. I next downloaded Raptor and ran the rapper command to convert the Atom to RDF, but it doesn't support Atom 1.0, so I next used sabcmd to apply this Atom->Rss XSLT stylesheet, and then used rapper to convert the RSS to RDF. It converted the above into the following:
Zotero doesn't give an error on the generated file (which is quite large - 44,000 triples), but also doesn't import anything. Ideas? Should I resort to a manual conversion (e.g., Python) straight away?
<link rel="alternate" href="http://www.boingboing.net/" type="text/html">
<entry gr:is-read-state-locked="true" gr:crawl-timestamp-msec="1183513071338">
<id gr:original-id="">
tag:google.com,2005:reader/item/736719fd96caee41
</id>
<category term="user/14187970455121264404/state/com.google/broadcast" scheme="http://www.google.com/reader/" label="broadcast">
</category>
<category term="user/14187970455121264404/state/com.google/read" scheme="http://www.google.com/reader/" label="read">
</category>
<category term="user/14187970455121264404/state/com.google/starred" scheme="http://www.google.com/reader/" label="starred">
<title type="html">
Caffeine may not give a jolt to health - The vaunted benefits of coffee and tea may be in spite of the stimulant rather than because of it.
</title>
<published>
2007-07-02T09:52:03Z
</published>
<updated>
2007-07-02T09:52:03Z
</updated>
</category>
</entry>
</link>
<link rel="alternate" href="http://science.reddit.com/goto?rss=true&id=22udf" type="text/html">
<summary xml:base="http://reddit.com/" type="html">
<a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;ctrack=1&amp;cset=true">[link]</a><a href="http://science.reddit.com/info/22u\
df/comments">[more]</a>
</summary>
<author gr:unknown-author="true">
<name>
(author unknown)
</name>
</author>
<source gr:stream-id="feed/http://science.reddit.com/.rss">
<id>
tag:google.com,2005:reader/feed/http://science.reddit.com/.rss
</id>
<title type="html">
science: what&#39;s new online
</title>
</source>
</link>
I first tried using triplr.com for a conversion to RDF, but this file is simply too large at 12MB. I next downloaded Raptor and ran the rapper command to convert the Atom to RDF, but it doesn't support Atom 1.0, so I next used sabcmd to apply this Atom->Rss XSLT stylesheet, and then used rapper to convert the RSS to RDF. It converted the above into the following:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<rdf:type rdf:resource="http://purl.org/rss/1.0/item"/>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:title xmlns:ns0="http://purl.org/rss/1.0/">Caffeine may not give a jolt to health - The vaunted benefits of coffee and tea may be in spite of the stimulant rather than because of it.</ns0:title>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:link xmlns:ns0="http://purl.org/rss/1.0/">http://science.reddit.com/goto?rss=true&id=22udf</ns0:link>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:description xmlns:ns0="http://purl.org/rss/1.0/"><a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;ctrack=1&amp;cset=true">[link]&\
lt;/a><a href="http://science.reddit.com/info/22udf/comments">[more]</a></ns0:description>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:pubDate xmlns:ns0="http://purl.org/rss/1.0/modules/rss091#">2007-07-02T09:52:03Z</ns0:pubDate>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:date xmlns:ns0="http://purl.org/dc/elements/1.1/">2007-07-02T09:52:03Z</ns0:date>
</rdf:Description>
<rdf:Description rdf:about="http://science.reddit.com/goto?rss=true&id=22udf">
<ns0:encoded xmlns:ns0="http://purl.org/rss/1.0/modules/content/"><a href="http://www.latimes.com/features/health/la-he-caffeine25jun25,1,130150.story?coll=la-headlines-health&amp;ctrack=1&amp;cset=true\
">[link]</a><a href="http://science.reddit.com/info/22udf/comments">[more]</a></ns0:encoded>
</rdf:Description>
<rdf:Description rdf:nodeID="genid2">
<rdf:_4 rdf:resource="http://science.reddit.com/goto?rss=true&id=22udf"/>
</rdf:Description>
</rdf:RDF>
Zotero doesn't give an error on the generated file (which is quite large - 44,000 triples), but also doesn't import anything. Ideas? Should I resort to a manual conversion (e.g., Python) straight away?
This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
If you're comfortable with JavaScript, the best thing to do would probably be to write an import translator for Atom, since you could then import all the other available metadata in the Atom file. You can look at the MODS or RIS translators for examples of how to write an import translator (ignoring the export parts). You can post a note on the dev list if you have questions.
We'd be happy to include such a translator in Zotero itself. If we did so, a Google Reader translator could simply pull the Atom file and pass it to the other translator.