Hardlinks to enable separate storage folders for groups?

I would like to have group files in a separate storage folder for easier synchronization.

Is it a bad idea to have a storage folder for each group, and then create hardlinks from that folder into the main storage folder?

ln for Windows is available here:
http://schinagl.priv.at/nt/ln/ln.html

ln.exe --recursive MyGroupStorage storage
  • When you say synchronization, I take it you mean sync using some other mechanism than Zotero built-in sync, something like dropbox? Because sync wouldn't become easier for Zotero sync splitting off things into separate folders.

    Is it a bad idea? From a support POV for Zotero, certainly, yes. Is it a good idea for you? That I can't say, but it would require some development work to get done. Zotero doesn't facilitate it by default, so you'd either have to write a plugin to do this, or do offline processing while Zotero is not running (because you have to access the DB to do this, and you can't access the DB while Zotero is running).

    In any case, whatever you do, make sure to only do this with the attachments. Do *not* do this with any of the Zotero DB files. Dropbox-sync of your Zotero DB is exceedingly likely to corrupt it.
  • Exactly: I would still use Data Syncing for Zotero ("Sync Automatically" and "Sync full-text content"), but I would disable File Syncing.

    Then I would keep the files in My Library in the regular "storage" folder, but have all group libraries in their own storage folders, such as MyGroup1.

    These folders would be synchronized by dropbox.

    As Zotero requires all attachments in one folder, I would run a script regularly that links all of the files in MyGroup1 into the storage folder. On Windows, the ln.exe program described above deals with most of the issues already, it will link into the folder and skip files already present. The only thing it does not do is to remove hardlinks that have been removed in the group storage folders, but that is easy to code and would only happen when I remove or rename files, which I only do very rarely. All this would do is create rare orphan files, not a big issue.

    So as far as Zotero is concerned, it should never notice what I did, as all files appear in the regular storage folder as hardlinks, which are the same as regular files.
    I would neither write a plugin, nor access the database.
    What this would do for me is that I can now have the files for different groups completely separate.
    I would not touch the zotero database, just the storage folders.

    This post gave me the idea:
    https://forums.zotero.org/discussion/38106/space-effficiency-using-hardlinks-to-pdfs-with-zotero-store-files/p1
  • This should mostly work for sharing the file contents, but I don't see how you'd do this without hitting the DB to tell which attachment belongs to which group.

    Note that if you're linking them, deleting them in Zotero will remove the link in the Zotero storage dir but will leave the file in the dropbox-synced folder, as from the point of view of the OS, the file was not deleted (which Zotero would think it had), it just dropped the inode link count (or whatever the windows equivalent is) from 2 to 1; the OS just considers the storage space as up for grabs when the inode link count drops to 0 and that is the actual delete. This is technically the same for "normal" files which are not hard-linked; a delete would just drop the count from 1 to 0 and the file is considered deleted. Your script would also have to detect that the file was deleted from Zotero if you don't want to accumulate cruft over time.

    Renames in Zotero would likewise not propagate to your dropbox-synced folder, since each link carries its own name. This will likely break the sync, because the metadata in Zotero *would* sync the changed name through Zotero sync, so in this scenario:

    * Computer A creates attachment A.pdf and syncs metadata for A.pdf; your script hard-links A.pdf to Dropbox/A.pdf and dropbox syncs it as A.pdf

    * Computer B syncs in metadata for A.pdf and the file contents under A.pdf in dropbox

    * Computer A renames A.pdf to B.pdf and syncs metadata for what is now B.pdf. I don't know how your script would handle this.

    * Computer B syncs in metadata for what is now B.pdf, but what's in dropbox at this stage I don't know

    You could perhaps work around this by naming the files in your dropbox folder item-id.ext, because the item-id is stable under renames, but this is just the first edge case that popped into my mind. I can't say what others may exist.
  • Thank you very much for the thoughtful response, it looks as if it is not as easy as I thought, but it might still work.

    What I really want to achieve is this:
    Computer A (laptop) holds my current work.
    Computer B (desktop) holds my larger personal library.
    B should eventually contain all of A, but I do not want items of B be available on A (for space reasons). B contains many years of research, letters, reviews etc.

    If Zotero had an import/export feature that ignored already imported files, I would just export on A and import on B.
    This worked very nicely in JabRef, which has an import option "Deselect all duplicates", but with Zotero this is currently not possible.

    If Zotero had separate storage for each group library, I would just use webdav for the group, but that isn't possible either. So what I was trying to do with the hardlinks is to emulate separate storage for separate groups.

    Originally I wanted to work with hardlinks and write a program to keep track of items originally from A, but you gave me a better idea:

    If I create NTFS junctions for each folder from A, synchronization will work in both directions. The name of the folder already is the item-id, so it will be stable under renames. If I add a note on B, it will be propagated to A. It would still orphan folders on A if I delete an item on B, but that is rare enough for me not to be an issue.
    I think there is a limitation that junctions always have to be absolute, but symbolic links can be relative as well. Either way, it would be no issue at all to re-create the links if something goes wrong.
  • It's not possible in Zotero because importers don't have access to your library so they can't deduplicate at import.

    In the case of linking folders you'd still need to access the database to tell which attachments live in what groups.

    But if you don't mind using zotero storage to do all the attachment sync, what you want seems already possible in Zotero, and if I'm interpreting your comment above right, it's what I actually already do, no fiddling with links required. I have a main account on which I pay for storage. In this main account I've created a few groups; one of those groups is shared to a secondary account I have (which is the account I'm usually logged on with). I start Zotero with a profile that syncs the secondary account, and hey presto, only that group shows up locally, attachments and all. I hardly ever log into the main account unless I want to copy things between groups. My "own library" is empty in that secondary account, I just use only the group.
  • Storage would work, but there are some concerns from our IT department about using one more cloud storage provider. It will be faster for me to just code it myself than to deal with them. I will give it a try and if it works I will post again.
    Thank you very much for your help!

  • They're not a cloud storage provider -- they offer sync of strictly Zotero data, and that this involves cloud storage is an implementation detail; you're also likely already going to use them for sync, you're just not paying for > 300MB storage, so the "we don't want another cloud provider involved" fails from the get-go. I'm in IT of a fairly large corporation, I sort of understand their pushback against a proliferation of competing services, but this seems like a knee-jerk reaction to a service they haven't looked at in detail.

    That said, "it will be faster for me to just code it myself than to deal with them" rings true unfortunately.

    If you're just worried about clutter, not storage space, you can still just sync all attachments using dropbox or whatever (*not* the database) and then use groups to segment what you see on each computer. The computer where you see less would still have the attachments, Zotero just wouldn't know about them there. This seems to me less fragile than mucking about with links.

  • It is more of a political issue, our IT execs decided to outsource what we did in-house before, and now we are left with an inferior solution that leaves my IT colleagues absolutely frustrated. So not a good time to ask for anything. Good opportunity to give it a try myself.
  • Got that, all I'm saying is you may not need to bother with all the linking stuff to achieve the desired effect.
  • Oh wait, you said "for space reasons". Then my solution does not apply.
Sign In or Register to comment.