Converting HTML to importable format

marsh · June 1, 2010

Hi,

I've searched the site and find lots of things that come close to what I want to do, but so far nothing is exactly on target.

Often one encounters edited collections on the web and would like to import not only the reference information of the book itself but also separate reference information for the individual book sections. For example, I'm planning a course for next fall and would like to be able to deal with individual chapters in the anthology.

If the book publisher or database provides this information in a format Zotero already imports, the solution is trivial. However, most often I find the Table of Contents is listed in ordinary web pages which Zotero can't parse into separate book section entries. Because of the nature of tables of contents, the HTML of these web pages is usually very well structured, as in this example.

It therefore would seem relatively easy to open the source of the page in a text editor and convert it to a format that Zotero can easily import. But I don't know which of the several formats Zotero can import comes closest to HTML (I'm guessing something with XML), so picking out a target format is hit-or-miss. Also, one then has to find documentation for the target format, and I haven't seen this on the Zotero site.

Can anyone direct me to where I should be looking?

Thanks.

marsh · June 10, 2010

Nevermind.

I realized that Zotero handles BibTeX and Refer, so I didn't have to figure out any cryptic RDF or XML standards. Silly me. What was I thinking? (Obviously not very much.)