Compatibility with DevonThink or webarchive format
At the moment using both Zotero and Devonthink is awkward. They serve very different purposes, but it would be good to have them mesh better. Devonthink can import the Zotero files, but they remain merely a collection of files making up a website. Devonthink can import the bibtex export, but that loses the website itself. There *is* a standard that would be useful since both the Mac generally and Devonthink in particular understand it: webarchive. That's a format to encapsulate a website that is treated as a single file but is searchable, etc. by Devonthink.
I'm not asking the Zotero move to it as its format for websites, but simply consider adding it as an export option.
I'm not asking the Zotero move to it as its format for websites, but simply consider adding it as an export option.
I really don't recommend Zotero be supporting a closed format of this sort, even if they could (which given the file I looked at seems doubtful).
It would be nice, BTW, if Zotero (or Firefox) would compress the saved archives, at least optionally. They take up a fair bit of space; much more than my database.
1. There are a lot of devonthink users. ANY readable format would be nice. But zotero saves its web archives in its own format that, for one, has arbitrary folder names. It would be nice if it zotero's web archive were either a) saved in a format that presented the relevant information (page title, home page, etc.) in a transparent way or b) offered an export option for a widely used format.
2. Note that Zotero offers exports options in other "private" formats.
3. Devonthink is, of course, not the only text-handling database program suitable for academics, writers, researchers, etc. And, all of them could use an easy way to integrate with zotero.
4. I'm sure that someone good at scripting could write a script to convert zotero's RDF format to another one (like webarchive).
Based on this criteria (not an open format, and not accessible outside of proprietary and platform specific APIs), how does it hold that Zotero exports to other private formats?
But that aside, I agree that being able to have more meaningful folder names would be useful. Indeed, this is what Firefox itself does when you save a web page.
Other programs deal with webarchives (Devonthink for one). People have already written very small scripts that turn a site one is looking at (in Firefox, e.g.) with one click into a webarchive and push it into Devonthink.
It may be, of course, that I'll get more joy from the Devonthink crowd and maybe someone there will write a script.
The import of web pages is a very convenient feature but should be improved because at the time all the components of the web page (images etc.) are downloaded in a single folder. That is Zotero now uses just the save alla file method. Undoubtedly this enables you to retrieve the stored web page and view it as if it were just retrieved from the Internet, but at the same time after a while your disk is full of useless files.
I am unable to discuss the opportunity of adopting the webarchive format proposed by Auerbach, but I can see that Firefox has a compressed format for saving web pages a single file. On a Mac this is possible not using the Save page... command from File but by pressing the ALT (or CTRL) key on a hyperlink: in this way a new window opens and you can save the webpage which the chosen link points at in a format called HyperText which produces a single file.
If Firefox can do it, probably even Zotero could do the same and this would be a great improvement on the current method (at least for our disk space!)
Another problem that I put to your attention and that if it were solved could improve the compatibility between Zotero and DevonThink is the fact that Zotero gives to every folder which contains a saved web page a number, which obviously is the ID which connects the web page to the data stored in the database. But in my opinion this method could also be improved. I explain this with an example.
At the moment if you import the content of the Zotero storage folder in DevonThink you have a list of numbered folders which have no sense (just their content is meaningful and you have to open each of them to see it). If Zotero could adopt another method for assigning a unique ID to the web pages downloaded this problem could be solved. For example: if Zotero could use a compressed system for saving web pages in a single file as I proposed above, then it could use for creating the unique ID the title of the web page+something else, e.g. a number or a date (a similar method for producing a unique record key is used by the standard BibTeX). That would greatly improve the Zotero storage folder making the view of the saved files in the File system more comprehensible.
Any comments?
pierfranco
Firefox and Zotero also work on Windows, Linux, and a number of other operating systems. Right, but only Mac applications. As above; I have a feeling those scripts actually access Apple APIs to read and/or write the webarchive files.
I think the more promising short-term solution is for someone to write a little script that renames the folders that Zotero does create and maybe load them from there directly into DT.
But you can't use that for a directory name. Maybe use a more human readable label and include some index file that associates the directory with the original URI? E.g.:
-
uri: http://ex.net/1
directory: some_nice_name
Just an idea; not sure how good it is ...
Mozilla has no native archive format. There's an extension that allows saving of sites into a compressed/single-file format, but it's not under active development anymore and was never released for OS X. As I noted above, ZIP writing is planned for Mozilla, and we might offer the ability to use that when it's available, though it wouldn't be ideal for a number of reasons (harder to search, would only load and display in a browser with Zotero installed, etc.). Ideally Mozilla will implement a native solution that has some chance of interoperability with other [open-source] browsers. I wouldn't be surprised if this happens fairly soon after ZIP writing is added. There's a tracking bug on Bugzilla for all the various archive requests over the years.
Zotero itself doesn't need the index file that Bruce suggests, since it stores the path to the main attachment file in the database. However, providing some way for external consumers to map directories back to their ids/metadata might be helpful, though to do anything meaningful they'd probably want to access the Zotero database anyway (using either SQLite itself or some future local socket-based API that Zotero provided), in which case they could just look it up based on the filename. But perhaps Zotero could create a .zotero-info file in the attachment directory that provided the id/uri. Maintaining a single index file would be trickier and slower to update.
Re: DevonThink, I'd agree that, for now, getting someone in the DT community to write a script to parse Zotero RDF and convert the exported folders of HTML files to .webarchive using the Apple API might be the best approach. Note that, other than the folder names, we don't save snapshots in our "own format." They're just HTML and related files.
My reason for this is that I use Groove 2007 to synchronize multiple computers. It has a limit on the raw number of files it can synchronize. Most of the time this is not a problem, but when you take a snapshot of a page that has 10 or 15 associated images, stylesheets, etc. the folder quickly grows to exceed this limit.
1. Compatibility with Firefox 3.0 - this of course, will be an inevitability.
2. Saving a web page as single file, ideally MHTML - there is an open source plugin for Firefox that saves files in MHT format. If Mozilla doesn't integrate this within Firefox 3.0, Zotero should.
Check out UnMHT for Firefox
http://www.unmht.org/unmht/en_index.html
While I can see the value of being able to easily move around a document, I don't think a "single file" is necessary for that, nor that MHTML is a particularly good solution.
I've not looked at Mozilla's alternative closely, but it seems they took a better tack: a compressed archive.
Moreover, I don't think you should be complaining to Zotero; this is really a Mozilla issue. If it's such a critical feature, with such an obvious solution, it seems to me you'll see it in Firefox.
Thanks!
Posted info on this at this forum posting:
http://forums.zotero.org/discussion/14268/devonthink-and-zotero/
unfortunately I haven't touched the issue of attachments such as pdfs etc. which are associated with entries - that would be great if someone modified the script to do that.