Create Zotero library using disk directory structure

Good day

It is 2020, and I still organize my scientific library as a set of .pdf files located in directories and sub-directories named after corresponding topic. I would like to start using Zotero, but cannot find an easy way to re-create the directory structure inside Zotero.

Currently, I have several hundreds of directories/subdirectories on my disk, and re-creating such a hierarchical structure in Zotero manually folder-by-folder will be a horrible exercise. Is there an automatic way of doing that, or at least making this process easier? Perhaps a plugin or a translator can do that?
«1
  • I'm not aware of a plugin that can do that, but if someone wanted to work on it we'd consider taking a patch to allow adding a folder (instead of just files as you can do now) and to recreate the folder structure as collections. Main complication is that Zotero will add any file type as an attachment, not just PDFs, so if you had other files below in any of the directories those would be added too unless there was a prompt that let you somehow specify what types of files to add.
  • @depswa what platform are you on?
  • @dstillman I do have other files than PDFs, but not too many. It is not a problem to clean up manually later, if only the folder structure can be created automatically.

    @emilianoeheyns I am on Windows 10, but if solutions exist for Linux, I am willing to copy my files on Linux. I guess transferring Zotero library from Linux to Windows should not be a problem later.
  • I'm thinking it should be possible to create an RDF file that would on import recreate the folder structure.
  • It should be, but that doesn't sound like much fun.
  • Any ugly solution is better than no solution at all. Do you know of any tool that can scan a folder structure on the disk and create an RDF file?
  • Meh, I'm willing to give it a shot. BibTeX would be easier, but it'd mean dummy entries to hold the attachments. I think RDF would allow top-level attachments.

    Another option would be a simple import translator which imports the output of "dir /s /b".
  • Even if you didn't want to work on a patch, I would recommend just doing it via the JavaScript runner rather than trying to generate RDF. There's really no need to go via an external script and RDF for this, and such a solution would be much harder for most people to run.
  • On the other hand, a plugin that does this shouldn't be much work. Probably preferable. I'll give it a shot.
  • Thank you very much. I will hope for a solution.
  • Posted almost simultaneously :) I'll just do a quick plugin first, if that works, I can look at converting it to a PR.

    Suggestions on where to put it in the menus? File menu?
  • It would just extend the functionality of "Link to File…" and "Store Copy of File…" (which are already named incorrectly, since you can select multiple files).

    And ideally I suppose we would also allow you to drag a filesystem folder to a collection, but we could deal with that separately.
  • Though we might need to be a little more explicit about what was going to happen here, since particularly if we're adding all files indiscriminately then selecting, say, your home folder by mistake could be rather disastrous.
  • That menu is only enabled when the current item is editable, so it wouldn't work as-is for importing a folder of attachments.

    Maybe drag and drop is the best interface for this? That would seem to me to be a deliberate action.
  • No, those options are in the New Item menu — they're not dependent on the item selection.

    But thinking about what I said above, a simple solution would be just to prompt for confirmation if (and only if) a folder is selected instead of files. Could even have a checkbox (selected by default?) to import the subfolder structure as collections. (We could upgrade this to a proper wizard window later if we need more options.)
  • @dstillman I see them now. But in that selection dialog, selecting a folder and clicking "open" just navigates to that folder in the picker. It looks like file-pickers and dir pickers are different dialogs. I also can't multi-select files + dirs there.

    @depswa, I'm putting together something that can be ran in the javascript runner or a plugin, and that wouldn't require anything outside zotero, so if you can wait until the weekend that is probably the smoothest experience.

    But I already had the RDF generator mostly done, so: https://gist.github.com/8edd2f88118c9fd4869e97ebbdcbf54f

    *Please* only try this on an empty library. It Works For Me (tm) but will blithely import the lot and if there's something there you don't want, it will be hard to separate out. Needs python, place in the folder you want to import, then run

    python dir2rdf.py pdf docx

    and import attachments.rdf that it writes in the same directory. Don't move it from where it is, or it won't find the attachments.

    It will create a collection for all folders in that directory, regardless of whether it has files to import (something I'm likely to address in the javascript version in the weekend). The pdf docx at the end is the extension of attachments it should import. Any extension not listed is not imported.
  • But in that selection dialog, selecting a folder and clicking "open" just navigates to that folder in the picker. It looks like file-pickers and dir pickers are different dialogs.
    That's just a configuration flag on the filepicker.
  • This is called from addAttachmentFromDialog in zoteroPane.js, right?

    I take it the mode is the flag in question? That would still preclude this happening from the same picker if I'm reading this right -- the available modes are

    /** @const {Integer} FilePicker#modeOpen - Load a file */
    /** @const {Integer} FilePicker#modeSave - Save a file */
    /** @const {Integer} FilePicker#modeGetFolder - Select a folder/directory */
    /** @const {Integer} FilePicker#modeOpenMultiple - Load multiple files */


    but these flags do not appear to be combinable (they're 0-3, not the typical powers of 2), so that still looks like the options are open 1, save 1, pick folder, pick multiple files.
  • @depswa
    I had the same problem and wrote a library to create items in Zotero through Python locally. The library can do links, so you can keep your files in their original location.
    Currently the Zotero collections are not supported, but maybe you can use it as a starting point.

    https://gist.github.com/danbe/6547077
  • Try Mendeley? Mendeley can import files from folders and watch file folders. Then in Zotero import the Mendeley database?
  • @emilianoeheyns: Ah, yeah, you're right — I was thinking they were bit flags. So, then, we'd probably want a single option ("Add Files from Folder…"?) that showed a wizard that gave options for linking/storing and maybe some other things (PDFs vs. all files, including subfolders, replicating the subfolder hierarchy).

    But an initial implementation using drag and drop that 1) showed a confirmation prompt, 2) followed the modifier key for linking/storing, and 3) defaulted to PDFs 4) within all folders 5) with the folder hierarchy replicated would be a pretty good place to start.
  • @emilianoeheyns Thank you for the python script. Unfortunately, it gave me an error - likely because I have files with Unicode characters:


    > python dir2rdf.py pdf
    ['.pdf']
    Traceback (most recent call last):
    File "dir2rdf.py", line 67, in <module>
    f.write(dom.toprettyxml())
    File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    UnicodeEncodeError: 'charmap' codec can't encode characters in position 269503-269515:
    character maps to <undefined>


    If you are going to write a plugin, it would be really great if it can work with unicode filenames too.

    @danb Thank you for the hint. It's a useful library and I like the option of creating links. If I don't find another solution, I might be using your library - I just need to add a code for scanning the directory tree similar to the dir2rdf.py from @emilianoeheyns above.

    @erer007a I actually considered this way. Mendeley can import a folder with subfolders, but does not re-construct the hierarchical structure in the Library - all items will be added to `Unsorted` folder. At least I could not find a way of duplicating the folder structure from the disk.
  • @dstillman when I try to "Store Copy of a File..." with a shell script (.sh), it tries to open the script.
  • "Open" meaning what?
  • edited 8 days ago
    @depswa I've done my best on the UI (which means it's pretty janky), but give the xpi at https://github.com/retorquere/zotero-folder-import/issues/1 a go; after installation, select the green plus icon above the item list, and select "Add files from folder...".

    You can pick a folder, will have to wait a bit, and will then be presented with a list of file extensions it's found. You can select multiple (some of the more common ones will be at the top, the rest is alphabetical), select link or import, and let it rip.

    I cannot pre-select common extensions because the listbox control won't allow you to interact with them any more for some strange reason. It als does not check for duplicates, so I'd advice you to create a clean profile and test for a bit before committing to the current Works For Me (tm) state of things.
  • I'm not seeing that here.
  • Then there's probably something fubar about my setup. Not at all unlikely.
  • @emilianoeheyns Thank you very much! The new version of python script works perfectly and imported all my library without any issues.

    The xpi also worked. It showed however a pop-up error message right before importing any item:

    XML Parsing Error: undefined entity
    Location: chrome://zotero-folder-import/content/import.xul
    Line Number 4, Column 1:

    <dialog xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul";
    ^


    But after I closed the pop-up window, import began and was completed without other errors. All items were imported as links.

    At the moment, the python version looks more universal, as I am able to select file extensions to import, and, importantly, while importing RDF, I can choose whether to copy files to Zotero storage or to create links - this is a very useful option.

    Anyway, my problem is solved. Thank you very much for your help.
Sign In or Register to comment.