PDF Storage & Management Concerns

I already have around 20GB of PDFs, full-text indexed with dtSearch and some free desktop tools, and some of these are cited in Reference Manager.

One concern is that starting a new collection within Zotero's database will lead to an unmanageable situation with separate storage areas and indexes. Another is the practical, reliable storage capacity of the Zotero environment--does, say, a terabyte of PDFs have any impact on the performance and reliabilityof Zotero/Firefox?

I would like to know if the developers have considered and designed for PDF collections on this scale, and if it is possible to store PDFs downloaded via Zotero in an alternate location such as a pre-existing directory.
  • I'd be interested in this too. If you search the forums you'll find several threads about general Zotero database size: from memory, the upshot was that a 20 to 30GB database, with tens of thousands of items, should be OK.

    Presumably since pdfs sit in an external 'storage' directory, the only size limit would be that imposed by your available HD capacity. But the full-text index is stored in the Zotero db -- I'd be interested to know whether this imposes any further practical limits.
  • The idea "should be OK" is not necessarily the reassurance one would like for a mission-critical application. I'm already at that limit, and desktop databases of terabytes should be anticipated in the near future.

    As for the storage directory, should one expect a terabyte-sized subdirectory of Firefox to be fault-tolerant? What if the PDF collection is already situated on a RAID array? What are the implications for external full-text indexing, editing, and viewing tools? Can an existing directory or collection be integrated with the Zotero database?

    Another minor issue: does/will Zotero allow the PDFs to be renamed as they are downloaded? We already have a naming standard in place.

    I'd like to hear directly from the developers as to their plans for VERY large document collections.

  • The idea "should be OK" is not necessarily the reassurance one would like for a mission-critical application
    Oh dear, why do web forums make people so bad-tempered? I was just pointing out that there are some existing related threads, which you probably should glance over.

    As for the storage directory, should one expect a terabyte-sized subdirectory of Firefox to be fault-tolerant?
    It's a subdirectory on a HD (or network location, etc), not really anything to do with Firefox. It will be as fault-tolerant as is your storage infrastructure.

    As for the rest, my guess is you're looking for something a bit beyond Zotero's scope. But I'd be interested to hear what the Zotero guys have to say.
  • edited August 14, 2007
    The limiting issue here would be the speed of the JavaScript infrastructure currently used for populating and sorting the items list from the SQLite database. This is addressed on other threads, but the upshot is that, while your mileage may vary, Zotero isn't currently really equipped to handle libraries over 10,000 or so items. We hope to improve it greatly in this regard in future releases, though it will probably require 1) Firefox 3.0, which hopefully will add UDF support to mozStorage so that we can do collations in the database layer, 2) porting some core code into C++ and replicating some other logic in SQL, and/or 3) a pref to disable sorting (or at least semi-intelligent sorting) of the items pane.

    The size of the storage subdirectory itself should have no effect on Zotero or Firefox.

    You can't set the storage directory independently of the Zotero data directory, but you can use any arbitrary directory as the data directory.
Sign In or Register to comment.