automating mass-import from PDFs

adamsmith · September 10, 2012

no - Zotero doesn't have a "watch folder" function along the lines of what Mendeley has.
If you're going to use the dropbox/symlink method as you suggest you might in the other thread, you _do_ keep your pdfs in Zotero storage - the symlink is then used to mirror the storage folder to dropbox.

bernard_ivo · September 10, 2012

Watch folder function might be good idea for the future, I find it very usefull in some other applications. Nevertheless what will be your suggestion for building the library once all the PDF's are in the proper storage folder, and metadata should be retreived for all of them?

adamsmith · September 10, 2012

sorry, I don't think I understand the question. You can't just place the files into Zotero/storage. You have to get them into Zotero using drag&drop or Store Copy of File. Once you have done that, you can use retrieve metadata.

jessewcollins · October 5, 2012

Is there an easy way to have Zotero attempt to retrieve metadata for every PDF in a large directory tree containing many pdfs? I have about a gig of PDFs stored in a Dropbox-syncing folder, itself broken down into ~100 subfolders.

This is my favorite way to keep my .pdf library--it can be read anywhere, including off-line, and although the structure is complex, it more or less makes sense to me and I can usually find old papers this way even when I can't remember author names or other useful details.

I just don't want to have to drag&drop each file. Is there a way to attempt this from the command line? I'm running Mac OS X.

adamsmith · October 5, 2012

I don't think there is an easy commandline solution.
The easiest would be to create a virtual folder with all your PDFs on one level and then drag those to Zotero in a couple of batches.

If you don't know how to do this on a Mac it should be easy enough to google.

ronan.mt.fleming · January 6, 2014

Hi All,

automated import of a batch of pdf's then association with high quality metadata would be a great feature to attract many new zotero converts as it builds upon existing efforts (perhaps not ideal) to organize a set of pdf's.

After dragging and dropping a few pdf's, the "Retrieve MetaData for PDF" menu function seems to work reasonably well at getting the correct data to ultimately generate a citation. However, it would need to work very well to encourage any seasoned researcher with thousands of manually curated .bib entries to make the switch.

I wonder has anyone implemented a way to batch import a complete directory structure, then associate with each pdf one or more tags according to the names of the folders (or subfolders) which contained the orginal pdf? It is a common legacy issue that researchers have stored their data in such a hierarchical tree. Especially for interdisciplinary research it is impossible to maintain a single hierarchy as a given paper could easily be placed within two hierarchies. I understand that tags are a way to overcome this issue but do I really have to manually add all these tags when they are implicit in the existing file structure where the pdf's are stored?

Regards,

Ronan

aurimas · January 6, 2014

However, it would need to work very well to encourage any seasoned researcher with thousands of manually curated .bib entries to make the switch.

You should just import the bib files instead of PDFs followed by metadata retrieval. Associating the PDFs with each bib entry may have to be manual at this point. Are the .bib entries in any way tied to the PDFs on disk? (i.e. is the path of the PDF stored in the entry?)

I wonder has anyone implemented a way to batch import a complete directory structure, then associate with each pdf one or more tags according to the names of the folders (or subfolders) which contained the orginal pdf?

Not yet, but it sounds fairly simple to do and could be added as part of the Zutilo add-on

godblessfq · January 9, 2014

Can some one show me how to automatically retrieve the meta data (if there is one) of all pdf file that doesn't have a parent entry and when pdf files are dropped on zotero? It is tedious if I have to get the meta data of pdf files already in the library, I have to select them then right click and select find matadata.
I want the meta data because I have some pdf files with names are unrelated to its content.
Thank you very much!

adamsmith · January 9, 2014

it's not possible, but you can retrieve metadata for multiple files at once, so it's not like you have to do this for every file individually.
A major reason for not doing this automatically is the google lock-out described in this thread.
edit: for further discussion of this, please do start a new thread as Dan asked you to.