Errors found in database - now what?
I clicked the "check database integrity" button in the preferences, and got the message "Failed. Errors were found in the Zotero database!"
So, what do I do now?
(Running Mac OSX 10.4.10, Firefox latest (2.0.0.9))
Thanks!
So, what do I do now?
(Running Mac OSX 10.4.10, Firefox latest (2.0.0.9))
Thanks!
http://www.zotero.org/utils/dbfix/
Do you get an "file done" message or anything?
Heidi
If something like that doesn't happen any time soon, I'd think your best bet would be to send an email to Zotero support as per the instructions on the repair tool page.
ERROR
The requested URL could not be retrieved
While trying to retrieve the URL: http://www.zotero.org/utils/dbfix/?
The following error was encountered:
* Write Error
The system returned:
(110) Connection timed out
An error condition occurred while writing to the network. Please retry your request.
I just export my database and delete everything, and import again. The problem is gone!.
But after I modified some pdf file store in the database using acrobat reader, the problem appears again.
So I seriously doubt that the error is due to the modification of pdf file outside zotero if that is your case.
A simpler solution is to clear the index and translation table and reindex it, I guess.
WinXP SP2. FF 2.0.0.4, Z 1.0.1
I just had the same thing happen to me. What was I doing? Mostly deleting some web page snapshots and replacing them with "Create new item from current page." Also dragging items from the library list into a sub-collection. Often I would get the "do not enter" cursor when placed atop subcollections that I know the item was not present in (a hint as to the problem?). FF/Zotero crashed, Zotero said it recovered using an autosave from 1.5 hours ago and now when I click the "check database integrity" button, it tells me there are errors.
My database is >500MB (which apparently makes the utility referred to a non-starter) with a bunch of pdfs . Bunch of notes I've added. I decided to store the pdfs in Zotero DB as the functionality permits. Now... how to recover?
Aside: Can one reconfigure the auto-save interval or uncover an explanation of what, exactly, is auto-saved?
Thanks!
The "auto-save" just uses the backup of the DB saved the last time you shut down Zotero.
Also note that the PDFs aren't really stored within the database. They're stored externally, but Zotero indexes the full-text content and stores that in the database. You could try rebuilding the full-text index in the prefs (after making a copy of your database first with Firefox closed).
If that doesn't work, you'll need the SQLite3 command line client to do a dump and restore (which is all the DB repair tool does), again with Firefox closed:
I tend to disagree, however, by your characterizing it as a Firefox crash as FF hasn't crashed on me before (although because of, i suppose, its process/thread structure it has successfully tied up my machine - waiting on a large download on my dial-up conx - until I have killed it off). Unfortunately I forget the actual sequence of error messages/disappearing windows at the time that one or the other or both FF and/or Z crashed (and I don't know if Z has its own thread or is run off a FF thread/process. If it's the latter than of course Z can't crash but could only cause it's host to jump out the window).
Is your description of pdfs not being really stored in the DB one of semantics? The actual pdf file is stored by Z within the Z storage file structure which, to me, means that it is, effectively, stored in the Z DB. This doesn't appear any different to any of thousands of gif images associated with web pages that are also "stored" in the db. The significance here either way is that I need to choose whether to store artifacts in Z directly or in some other simple file system - such as folders organized by subject, author, etc. - and then simply referenced (linked) in Z. If Z's db is going to get corrupted even irregularly then I must regard it as unreliable and therefore not a wise choice to use. I really like the tool that you folks have created and how it works but computers already give me enough grief and I don't need yet another piece of non-robust software to cause me more.
I also recognize that it is still in early going (version 1) but, please, nail down the robustness - i.e. zero chance of db corruption - then work on the features. I'm looking at this tool to store my references from here on, not just for a project lasting a month or a year. Whether that's wise in the best case is another matter.
I'll have a look at the tools you mentioned.
How about creating a shell, of sorts, which is the top-level of Z from which the main Z is then invoked. Then, to backup or set an auto-save interval, the Zshell could:
1. display a pop-up saying "now shutting down Z to create a backup"
2. shut down Z
3. take the backup
4. restart Z
As I said, it isn't pretty. I don't know anything about browser coding so can't offer any sort of strategic design insight.
Whatever brings reliability is good for me.
You are correct, however, that Zotero runs on the main FF thread, so no, Zotero itself cannot crash—only Firefox can crash. A Firefox crash indicates a Firefox bug, though such bugs might be exposed by code in Zotero, and we can sometimes work around them if given sufficient details (such as the CrashReporter log output on OS X and steps to reproduce).
My description of PDFs not being stored in the DB is not one of semantics. The Zotero database is the zotero.sqlite file. The contents of the storage folder, where PDFs and other files are stored, are just files in the filesystem. Zotero doesn't store files in the database. As I said above, it does store the full-text search word index in the database, however, which is why the size of the SQLite file will increase after adding a file to Zotero.
We haven't seen any widespread data reliability problems in Zotero. Generally, database corruption we've seen has been due either to Firefox crashes or instability of the underlying Firefox 2.0 storage system during very large transactions (which has generally only manifested itself during database upgrades, and we're not doing any more of those in Firefox 2).
Obviously, neither of these problems should happen, but there's not really anything we can do about them from our end. From what we've seen, however, the Firefox 3 storage mechanism should be much more robust, and the next major version of Zotero will require Firefox 3. In the meantime, we recommend you back up your Zotero data directory daily (as you would do with any critical data).
Is the recommended backup strategy documented anywhere; I couldn't locate anything? If not, I could write something up to pass along to you folks if that would be helpful.
results from recovery attempt using:
sqlite3 zotero.sqlite .dump > dump.sql
mv zotero.sqlite zotero.sqlite.old
sqlite3 zotero.sqlite PRAGMA=auto_vacuum=1
sqlite3 zotero.sqlite < dump.sql
Step 3 appears not to work in XP. I assume it works fine in *nix based on the "mv" command in step 2.
On step 4, I get about 50 errors all stating:
"SQL error near line <line no>: column word is not unique"
Any ideas?
As for the Step 4 errors, those indicate problems in your full-text word index. Did you try just rebuilding that from the Zotero prefs?
Many thanks for the procedure to repair the database. Just a note for windows users, line 2 should read:
ren zotero.sqlite zotero.sqlite.old
so that the full code is:
sqlite3 zotero.sqlite .dump > dump.sql
ren zotero.sqlite zotero.sqlite.old
sqlite3 zotero.sqlite "PRAGMA auto_vacuum=1"
sqlite3 zotero.sqlite < dump.sql
All the code can be placed in a batch file. Remember to place the batchfile, sqlite3.exe (download from http://www.sqlite.org/sqlite-3_5_3.zip) and the zotero database (zotero.sqlite) in the same folder.
Can I offer some observations on this whole db integrity issue in hopes of enhancing robustness and reliability? Now that my blood pressure has dropped since the db seems OK again, they are framed in hopefully helpful language.
First, I'm not suggesting prescriptions but rather what result is useful to me and, I think, to others.
The wording of the FAQ on backups is what I would refer to as jocularly catastrophic - i.e. "hard drive melts, computer is stolen, etc." - suggesting that a user maybe really doesn't need to be concerned; who would want to steal this old desktop and, anyway, I've got RAID 1 against disk crash. It doesn't mention other equally catastrophic but somewhat less Fox-new-ish events such as an otherwise innocuous crash of Firefox that might irrecoverably corrupt the database which might have a day's or 10 year's worth of data entered.
Could I suggest that the FAQ answer include, therefore, the possibility of non-fixable database corruption from a non-catastrophic computer or s/w failure? For example, by Firefox crashing, if you want to phrase it that way. Also, it should likely include a note to check the database integrity at the time one backs up the directory. If the db has become corrupted without the user knowing about it - as happened to me when I was working this issue today - who knows what will happen days, weeks, years down the road when they realize the db is corrupted.
Even the above is a second choice. So, what is the first choice?
Let me include a little of my high blood pressure rant which I penned when I thought my db was irretrievably corrupted (though it still appeared to work):
As it is, I am not pleased for the following reasons (this is not a personal attack on the developers - who I think have the makings of a truly fine piece of s/w - I don't even know them):
1. I was thrilled to come across Zotero and its functionality, immediately passed the word around to friends and had high expectations,
2. the db was corrupted last night after an entire day of entering files, web sites, and other data,
3. I went back to an earlier copy of the Zotero directory/database and it, too, has db errors which I didn't know existed,
Now I'm trying to figure out what to do. I've got many webpages and many notes against different items to recover (I'm working on a masters thesis). Because I chose the option to copy the pdfs into the Z directory, I also have to go through each of the old Z folders and find those pdfs, move them somewhere else, and never again choose to save the file in the Z directory, but simply link to them."
[end of high pressure]
As mentioned, using the techniques Dan and MC provided/tweaked, the db seems to have returned to an even keel.
Nonetheless, the strategy for recovering from db corruptions and, equally important, avoiding them, seems to need work. The "you should always back up your data daily" line might be true - to an extent - but how many of the PC user community actually do that? And, if FF is inclined to crash from time to time, why isn't Z safe-guarded against that eventuality? Even as a plug-in, it's just makes sense to have a stand-alone strategy for maintaining integrity. For example (not as presciption but simply to make a point), is there anything preventing a FF add-in from writing a file (not necessarily THE file) to disk whenever it likes? I can't imagine there would be much pleasure amongst users who find they have to go back to the last Z directory save every time FF crashes or is killed (frequent or infrequent), not to mention spend time each time FF crashes to determine if they, in fact, need to go back to a previous version. this seems to me a bit of a show-stopper which, once people realize exists, can only prevent further acceptance and use.
Let me again say how useful Zotero is to me and how much I appreciate the work that has been done on it. I just know how quickly that appreciation can turn around though if a lack of reliability causes me grief (i.e. wasted and lost time and effort).
1) We do take data integrity very seriously, and Zotero does many things to protect user data: it makes a backup of the database on every shutdown of Firefox; it makes a separate backup of the database before every database upgrade; it checks as best as it can for database corruption on startup and during queries and reverts to a backup database if it finds any; it forces a restart if a database transaction is incorrectly left open. Even with no action on the part of the user, there should always be a backup no more than a few days (or a few hours) old and no chance of losing "10 years' worth of data".
2) We're largely at the mercy of Firefox bugs, though we're not aware of any widespread problems with general use of Zotero. The underlying storage mechanism in Firefox 2.0 does seem to have some instability, most notably, as I said above, during very large database transactions, and we're avoiding database upgrades until Firefox 3.0 because of that. Like you, some people have found inconsistencies in their full-text word indexes—possibly due to the same transaction bug in Firefox 2.0—but those don't tend to interfere with normal usage of Zotero. A Firefox crash corrupting a database is very bad indeed, and I'm very sorry to hear that happened to you, but we don't have any reason to think that that is at all common—I have Firefox and Zotero open all day for development and can't remember a crash in recent memory, nor have I personally seen a Firefox crash cause Zotero database corruption. Still, from the testing we've done, we believe that the storage layer in Firefox 3.0 should be much more robust (since it's used in Places, whereas we may be the biggest mozStorage consumer in FF2).
3) In the meantime, other than all the steps we're already taking to protect data (outlined above), we're fairly limited in what we can do other than recommending daily backups. (From a quick check, Leopard's hourly Time Machine backups seem fine too.) Firefox 2.0 doesn't give us a way to close the database once it's open or reliably test it for errors, so we have to rely on a failure on open or a query failure, at which point we mark the database as corrupted. Making periodic backups isn't really an option, since even if we could make sure the database was in a consistent state (which I'm not sure of), such a backup would involve either first deleting the only known-good backup file or creating yet another potentially large backup file in the user's data directory (which may be on a USB key or network drive) and would cause Firefox to stall while the backup was taking place.
So, again, we do take data integrity very seriously, and while we haven't seen any widespread problems, we welcome any bug reports, crash reports, or suggestions that people may have while we work to get Zotero ready for Firefox 3.0.
(By the way, did you mean the your SQLite database itself (zotero.sqlite) is >500MB, or your entire Zotero data directory? I think you may mean the latter. How big is the zotero.sqlite file itself? Because you just need to get a ZIP of that within 70MB in order to use the DB repair tool on our site...)
Last thing first. Yes, it was my entire directory (my def'n of the db) that was >500mb. The sqlite file (your def'n of the db) is only about 15mb so that was simply a difference in understanding which is now corrected. I can see I'll need a lot more references in my db before I get to 70mb.
For the rest...
I am a huge fan of learning through "work flows". Sadly for me, the tech writer community in general is apparently not. Consequently, s/w documentation (in my experience) describes (hopefully) features but not in a way that describes how they were intended to be used (i.e. "what were the designers thinking"). This is a blanket statement which doesn't apply in every case.
In the current discussion context re db integrity and recovery, although I have seen the zotero.sqlite.bak file in my Zotero data folder I could not find a description of when it is created and how it is or should be used. I'm too lazy to observe changing file creation times as I do things like exit FF.
I found through Google a couple of mentions of the zotero.sqlite.bak file in Zotero.org from which could gleen a little info:
http://dev.zotero.org/overview_of_zotero_and_the_technologies_it_uses
http://forums.zotero.org/discussion/1410/recovery-of-a-document/
http://www.zotero.org/utils/dbfix/
There is nothing in the above or the documentation that I could find that gives a clear description of the db files and how to work with them. Then again, I am a guy and consequently half-blind.
For example, documentation might say:
1. the db file is zotero.sqlite. It is created automatically and placed in the directory specified in the "Storage Location" field under the "Advanced" tab in Zotero Preferences which can be found [blah blah] (default directory is [blah blah]). (We might even venture to call the storage location ZOTERO_DATA_HOME). In here are stored [blah blah]. It is created when [blah blah] and updated when [blah blah].
2. Actual computer file artifacts such as MS Word, pdf documents and other application files, captured webpages, [blah blah] are not located within the zotero.sqlite file but are, instead, placed in the directory ZOTERO_DATA_HOME\storage (in some structure which makes sense to Zotero but don't try and figure it out!)
3. It is possible that through various system failures - from a problem with Firefox to a hard drive crash to a stolen computer - the zotero.sqlite file (and possibly the ZOTERO_DATA_HOME\storage directory) can be lost. To aid in recovery from various forms of system failure, Zotero automatically creates a snapshot backup of the zotero.sqlite file in ZOTER_DATA_HOME called zotero.sqlite.bak. This is created when [blah blah]. The section "Zotero Database Recovery" provides a clear description of:
2.0 Day-to-day procedures to follow to ensure a successful recovery in the event of subsequent failure,
2.1 how to tell if your Zotero database has a problem, and
2.2 work flows that provide step-by-step instruction on how to use the zotero.sqlite.bak file and other tools to recover your Zotero database when a problem is encountered and recovery limitations.
Does this make sense?
A further suggestion: how about an option for more than one backup file (number user-selectable to some max). The problem of course, even with a single .bak, is that the db file and the storage directory will likely be out of sync. Maybe down the road some sort of audit function might be needed to either identify items missing from \storage (i.e. need to restore from .bak after an item was deleted since .bak taken) or get rid of artifacts no longer referenced (i.e. need to restore from .bak after an item was added since .bak taken). Which is why I suggested that the entire \zotero directory is the database.
I've kicked this horse enough. Hopefuly something proves useful.