Zotero RDF for data housekeeping
Using Zotero RDF as backup is strongly discouraged, eg.:
Documentation:
adamsmith Jan 25th 2012:
A plain text dump of all data is extremely useful for many purposes, the most important (for me) being easy data housekeeping. Editing raw data in a text editor with Regular Expressions is much easier and more sophisticated than risking JavaScript API scripts (especially if one is not a JS person, like myself). As I can see from searching the Forum archives, the find&replace feature was asked for for years (even called "batch editing") and recent information is that it is planned for the next version (4.0?).
adamsmith Apr 29th 2012
I would think that doing data housekeeping in this way should be possible provided that: (1) I make sure all my MSWord/OpenOffice documents have Zotero codes removed; (2) syncing is switched off; (3) the database is emptied and synced before importing.
Would that work and be safe?
Documentation:
and frequently on the Forum:Warning — Import/Export: Zotero allows you to export your Zotero library as a Zotero RDF file. However, exporting and importing your library via RDF won't result in an exact copy of your library, and it isn't recommended as a backup strategy.
adamsmith Jan 25th 2012:
I wonder if it is documented what exactly the "marginal data-loss" involves beyond item ID's? Is any of the metadata and/or notes&files in danger of silent modification?Exporting and re-importing your library to clean it up is not recommended. There may be marginal data-loss, but more importantly it breaks item IDs, which will break connections to Word documents and will cause Havoc if/when you use the sync feature. I would strongly recommend you stop doing that.
A plain text dump of all data is extremely useful for many purposes, the most important (for me) being easy data housekeeping. Editing raw data in a text editor with Regular Expressions is much easier and more sophisticated than risking JavaScript API scripts (especially if one is not a JS person, like myself). As I can see from searching the Forum archives, the find&replace feature was asked for for years (even called "batch editing") and recent information is that it is planned for the next version (4.0?).
adamsmith Apr 29th 2012
On the other hand, while I understand the problem of breaking ID's, I do not see why any other data should be lost? Unlike the other export/import formats, where mapping between fields and allowed content obviously must cause data loss, Zotero RDF is Zotero's own format and should not leave anything out. Or am I wrong?we don't have search&replace yet, no - good chance it's going to be in the next Zotero version - it's one of the top two priorities. (It _can_ already be done via the javascript API if you really no what you're doing
I would think that doing data housekeeping in this way should be possible provided that: (1) I make sure all my MSWord/OpenOffice documents have Zotero codes removed; (2) syncing is switched off; (3) the database is emptied and synced before importing.
Would that work and be safe?
Other issues to think about: At some size of library this may just crash: Zotero loads the entire RDF into memory before exporting it. A couple of thousand items works, but ten thousand might - depending on your computer - not.
RDF import is very reliable, but it _does_ break occasionally (one wrong character, a file link that your OS doesn't like etc.), and because of the structure of XML, that's harder to deal with: You can't just split the file in chunks like with bibtex or RIS.
And just so we're clear on this: none of your old Word documents created with Zotero would work anymore - that seems like a very steep price to pay.
There may be other issues I'm not thinking of. I really wouldn't do it and if you do you're pretty much on your own.
Memory problems with exporting seems to be an issue. Searching the Forum I see people had to face it with large libraries. I do not expect tens of thousands items in the particular Group project I am starting with Zotero but thinking ahead it may mean there will be no way to move data relatively losslessly from Zotero to other software (and while I like Zotero very much and wish it all the best, I would not like my data to be "arrested" within it). I had to move my bibliographic data from one program to another three times in my life and it was always a horrific experience, as all foreign formats did not include everything, truncated or otherwise changed data (for obvious reasons of different data models). So the only solution was always to use the native export format and write scripts to reformat the plain text file into something acceptable by the new program. And besides, it is always nice to have a human-readable version (the Report is fine, too, for that purpose).
Splitting the RDF should not be that difficult as I see all items, memos and attachements are top level elements. The problem would be ascertaining that multiple links are all within the same chunk.
OK. Thank you for indicating where the dangers are. I have been warned :-)
If this is a group library shared with other people, you really shouldn't do this—along with the other problems, you'll be requiring all other members of the group to resync all items. Handling of deleted items via sync is also somewhat non-optimal at the moment, so the group will permanently carry around a long delete history (for each time you do this) that could cause syncing problems.
I assume you've seen this?
Maybe I will never need such procedure (now I hope so) but working on Group Libraries with many people, as I expect, some data maintenance will surely be needed. And certainly not often, perhaps once in a few months or so. But I will keep in mind this is risky.
Yes, I saw the JavaScript API find/replace script -- it may be helpful in simple situations of replacing text in a given field. What I would like, however, is more complex functionality with Regular Expressions. It is also safer to do in a text file, as one may proceed semi-manually, actually seeing the string to be replaced and deciding what to do.
About the syncing with Group problems. Would this work: (1) move all items from Group to Personal Library into an empty Old Collection; (2) export to RDF; (3) import the processed RDF into an emty New Collection (to be sure); (4) create a New Zotero Group; (5) move all items from Personal to New Group Library; (6) delete the Old Group;(7) tell all members to join; (8) delete the Old Collection from Personal Library. Quite convoluted but should work? Say, twice a year? :-)
As Dan says, it's not a good idea to do mass deletes and RDF imports in a library or group. Apart from the efficiency problems that Dan mentions, it will break all documents that your users have built using the content of the library before the change (as adamsmith points out). Keeping local RDF exports of the content as a supplementary backup might make sense, but a wholesale delete and replace of the running library content would be asking for trouble.