Zotero data dir: unnecessarily big / filenames too long / no MHTML

BUGHUNTR · November 4, 2011

Hi, I am testing zotero for a while now - I have at least two problems with the data directory:

The data folder grows much too fast - looking into it I see several WTFs:

why are you NOT compressing the data? I have a zotero data folder that is around 500 MB in size - if I just zip it, it becomes only 160 MB! Why don´t you try to implement efficient data storage?

same category: why are you saving website snapshots NOT as MHTML??? Zotero is wasting lots and lots of diskspace with cluttering its data directory with thousands of small files from website snapshots - I did not find any option "save website snapshots as mhtml" - this is totally annoying and also leads to another error, see next.

zotero seemingly generates files which can have very long names - so long, that they are not handled any more by the system. I just tried to restore a backup of zotero data folder and the restore can not be done beause Zotero data folder does contain files with a filename that has more than 255 chars.

I got the impression, that developers are really trying hard to make the data directory as big as possible in fastest time - you want us to click the big "update storage" button really fast, don´t you? ;)

well, please make it more efficient and robust - researchers need serious tools, THANKS!

Thanks for your attention,
Bughuntr

Rintze · November 4, 2011

I think MHTML support is unlikely to be implemented as long as there isn't native Firefox support for the format: https://bugzilla.mozilla.org/show_bug.cgi?id=18764

ajlyon · November 4, 2011

Note as well that the Zotero.org storage does use compression-- people usually find that they need a good deal less online storage than the size of the storage directory might suggest.

Many of the extraneous files can be avoided by using an ad-blocker with Firefox; otherwise, Zotero quite reasonably attempts to get a complete snapshot of the page, so it stores all the supporting files.

As for MHTML, I think Rintze is right-- native support in Firefox needs to come first.

dstillman · November 4, 2011

why are you NOT compressing the data?

Because some people like to access attachment files outside of Zotero, e.g. from OS searches. This is the same reason we don't deduplicate files.

zotero seemingly generates files which can have very long names

What ajlyon says, but we are planning to patch WebPageDump (the snapshot code we use) to skip files with very long filenames (which are already shortened or skipped during syncing when necessary). I've added an issues for that.

adamsmith · November 4, 2011

that developers are really trying hard to make the data directory as big as possible in fastest time - you want us to click the big "update storage" button really fast, don´t you? ;)

afaik files are compressed server side, so no, that's not the motivation.