Importing metadata from PDF
I have a large number of PDF files that I have downloaded over the last several years. Is there a way that I can get the bibliographic information out of these files, into Zotero, and then link or store the PDF with that information?
This discussion has been closed.
I recently found the free tool ** CB2BIB ** (available for Windows, MacOS and UNIX) which might help you extracting metadata from pdf files.
-> http://www.molspaces.com/cb2bib/
Kind regards
Martin
Thank you for the reference to CB2BIB. It is unclear to me from the documentation how to use this program. It uses data from the clipboard?
Do you know if there is a way to do large numbers of PDF files at once using this program?
I did not use this program "seriously" yet, just played arount a little bit with it.
Did you read this "usage information" here:
http://www.molspaces.com/d_cb2bib-overview.php#usage ?
Isn't it the following, what you're searching for:
(Quote)
"Multiple retrieving from PDF files
Multiple PDF or convertible to text files can be sequentially processed by dragging a set of files into cb2Bib's PDFImport window. By starting the processing button, files are sequentially converted to text and send to cb2Bib clipboard panel for reference extraction. If the automatic recognition fails, the process pauses and allows for cb2Bib manual extraction. Alternatively, if automatic recognition succeeds references are optionally saved and next file is processed. See Configuring PDFImport section for setting your to text converter."
Sorry for not beeing able to be more concrete at this time,
Kind regards
Martin
It's true that PDFs don't have a common format for storing bibliographic information, but PDFs that came from particular databases often do come with standard cover pages. A good solution would be a family of "translators", like those used for webpages, that can extract this information. Another good solution would be to fully integrate a tool like cb2Bib. It would be a very large boost to Zotero's usefulness if I could drag one or several PDFs into the Zotero panel and have their metadata automatically extracted.
And you know of course that you don't need to track a PDF down to its original source if it is a PDF of a paper-journal article. If you have a good discipline-specific database for your area(s), which Zotero supports, or if Google scholar supports your area sufficiently well, you can just get the metadata from there, import it quickly into Zotero, and simply drag the pdf onto your new Zotero entry. I did a bunch of this last week, and the easy cases were very easy. It's just a matter of typing a few unique words from the author or title fields into the database, importing the metadata for a batch of articles, and then dragging the PDFs from the filesystem onto their newly created entry. Zotero does the rest. It helps if you have a file manager (like xyplorer for windows) which has a PDF preview function, as well as plenty of screenspace so you can see your pdf coverpage, your database, your list of PDFs to import and your zotero list all at the same time.
The hard cases were hard, and time consuming, but you could hardly avoid that with even a good set of translators, it seems to me. Things that would have any chance of being easy to construct a translator for were (in my case) from easily traceable sources, and therefore pretty quick to look up. Odd conference papers, or one-off journal articles would make for manual work anyway. In the end I found it easiest just to let Zotero import all the easy ones, and type the rest in by hand.
You're right. If you really have to hunt it's not worth the time. If you have to look more than one place for the metadata, it's faster just to type it in. And perhaps you'd find it less arduous than it sounds. Zotero is reasonably quick for the fingers
(SHIFT-CTRL-N, first letter of item type (J), TAB, title, TAB, Author Surname, TAB, First name, TAB, etc.) And add to that that automatic data sources (at least for my field) almost always have some typographic oddity which needs manual correcting. I started to thing that manually typing everything wouldn't be all that bad. Of course that all depends on whether the collection has real value for you. You won't be too motivated to manually enter data for things you don't see yourself really needing metadata for.
You put all of your PDF files in a folder or different classified folders and then use this free software:
http://www.mendeley.com/
in Mendeley, go to File menu and then add folder ( here you address your folder(s) which your PDFs are in). Mendeley import all bibliographic information in your PDF files.
The next step is to export you library as RIS and import it in Zotero.
Select the pdfs, right-click and choose "Retrieve Metadata" - last I heard Mendeley's feature worked a little better and they have some clever ideas of using additional data (including user-provided data), but Zotero does use Google Scholar Results as well as DOIs on the first page to get metadata and that works in a large majority of cases.
It is great, so we do not need Mendeley any more.
http://www.zotero.org/support/retrieve_pdf_metadata
I have 500ish pdfs on my computer and a comparable amount of bib data (for each pdf) stored in zotero. They are not associated yet, but I would like to do so for a new organizational schema.
Is it possible to automate the process of right-clicking for "add attachment" and then "attach stored copy of file" for all of them? [for example: bulk upload the pdf's as items to zotero and then ask it to retrieve the metadata from my zotero bibs?]
I'm having a huge problem.
I don't have any idea why, but I can't import or collect References of PDF documents!!!
What I have to do?
I want the meta data because I have some pdf files with names that unrelated to its content.
Thank you very much!