Create Zotero library using disk directory structure
Good day
It is 2020, and I still organize my scientific library as a set of .pdf files located in directories and sub-directories named after corresponding topic. I would like to start using Zotero, but cannot find an easy way to re-create the directory structure inside Zotero.
Currently, I have several hundreds of directories/subdirectories on my disk, and re-creating such a hierarchical structure in Zotero manually folder-by-folder will be a horrible exercise. Is there an automatic way of doing that, or at least making this process easier? Perhaps a plugin or a translator can do that?
It is 2020, and I still organize my scientific library as a set of .pdf files located in directories and sub-directories named after corresponding topic. I would like to start using Zotero, but cannot find an easy way to re-create the directory structure inside Zotero.
Currently, I have several hundreds of directories/subdirectories on my disk, and re-creating such a hierarchical structure in Zotero manually folder-by-folder will be a horrible exercise. Is there an automatic way of doing that, or at least making this process easier? Perhaps a plugin or a translator can do that?
@emilianoeheyns I am on Windows 10, but if solutions exist for Linux, I am willing to copy my files on Linux. I guess transferring Zotero library from Linux to Windows should not be a problem later.
Another option would be a simple import translator which imports the output of "dir /s /b".
Suggestions on where to put it in the menus? File menu?
And ideally I suppose we would also allow you to drag a filesystem folder to a collection, but we could deal with that separately.
Maybe drag and drop is the best interface for this? That would seem to me to be a deliberate action.
But thinking about what I said above, a simple solution would be just to prompt for confirmation if (and only if) a folder is selected instead of files. Could even have a checkbox (selected by default?) to import the subfolder structure as collections. (We could upgrade this to a proper wizard window later if we need more options.)
@depswa, I'm putting together something that can be ran in the javascript runner or a plugin, and that wouldn't require anything outside zotero, so if you can wait until the weekend that is probably the smoothest experience.
But I already had the RDF generator mostly done, so: https://gist.github.com/8edd2f88118c9fd4869e97ebbdcbf54f
*Please* only try this on an empty library. It Works For Me (tm) but will blithely import the lot and if there's something there you don't want, it will be hard to separate out. Needs python, place in the folder you want to import, then run
python dir2rdf.py pdf docx
and import
attachments.rdf
that it writes in the same directory. Don't move it from where it is, or it won't find the attachments.It will create a collection for all folders in that directory, regardless of whether it has files to import (something I'm likely to address in the javascript version in the weekend). The
pdf docx
at the end is the extension of attachments it should import. Any extension not listed is not imported.I take it the mode is the flag in question? That would still preclude this happening from the same picker if I'm reading this right -- the available modes are
/** @const {Integer} FilePicker#modeOpen - Load a file */
/** @const {Integer} FilePicker#modeSave - Save a file */
/** @const {Integer} FilePicker#modeGetFolder - Select a folder/directory */
/** @const {Integer} FilePicker#modeOpenMultiple - Load multiple files */
but these flags do not appear to be combinable (they're 0-3, not the typical powers of 2), so that still looks like the options are open 1, save 1, pick folder, pick multiple files.
I had the same problem and wrote a library to create items in Zotero through Python locally. The library can do links, so you can keep your files in their original location.
Currently the Zotero collections are not supported, but maybe you can use it as a starting point.
https://gist.github.com/danbe/6547077
But an initial implementation using drag and drop that 1) showed a confirmation prompt, 2) followed the modifier key for linking/storing, and 3) defaulted to PDFs 4) within all folders 5) with the folder hierarchy replicated would be a pretty good place to start.
> python dir2rdf.py pdf
['.pdf']
Traceback (most recent call last):
File "dir2rdf.py", line 67, in <module>
f.write(dom.toprettyxml())
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 269503-269515:
character maps to <undefined>
If you are going to write a plugin, it would be really great if it can work with unicode filenames too.
@danb Thank you for the hint. It's a useful library and I like the option of creating links. If I don't find another solution, I might be using your library - I just need to add a code for scanning the directory tree similar to the
dir2rdf.py
from @emilianoeheyns above.@erer007a I actually considered this way. Mendeley can import a folder with subfolders, but does not re-construct the hierarchical structure in the Library - all items will be added to `Unsorted` folder. At least I could not find a way of duplicating the folder structure from the disk.
You can pick a folder, will have to wait a bit, and will then be presented with a list of file extensions it's found. You can select multiple (some of the more common ones will be at the top, the rest is alphabetical), select link or import, and let it rip.
I cannot pre-select common extensions because the listbox control won't allow you to interact with them any more for some strange reason. It als does not check for duplicates, so I'd advice you to create a clean profile and test for a bit before committing to the current Works For Me (tm) state of things.
The xpi also worked. It showed however a pop-up error message right before importing any item:
XML Parsing Error: undefined entity
Location: chrome://zotero-folder-import/content/import.xul
Line Number 4, Column 1:
<dialog xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
^
But after I closed the pop-up window, import began and was completed without other errors. All items were imported as links.
At the moment, the python version looks more universal, as I am able to select file extensions to import, and, importantly, while importing RDF, I can choose whether to copy files to Zotero storage or to create links - this is a very useful option.
Anyway, my problem is solved. Thank you very much for your help.