Import reference for PDFs via doi
Newer PDF-files often have the doi or a html link to the webpage on the front page.
A nice feature would be to make a translator for pdf-files like this:
detectweb:
-uses the pdftotext function on the frontpage of the pdf
-searches (regex) the resulting textfile for a doi or link to webtext page
-loads the webtext page in the background and runs zotero on this page
-if the fulltext page has a translator return a zotero import icon
doweb:
-get the reference from the fulltext page and add the pdf as attachment.
The tools to do this seems to be there in the zotero code, but unfortunately my coding abilities are not good enough to make a hack demonstrating this.
A nice feature would be to make a translator for pdf-files like this:
detectweb:
-uses the pdftotext function on the frontpage of the pdf
-searches (regex) the resulting textfile for a doi or link to webtext page
-loads the webtext page in the background and runs zotero on this page
-if the fulltext page has a translator return a zotero import icon
doweb:
-get the reference from the fulltext page and add the pdf as attachment.
The tools to do this seems to be there in the zotero code, but unfortunately my coding abilities are not good enough to make a hack demonstrating this.
HubLog: Full OpenURL metadata from CrossRef
HubLog: CrossRef Citation plugin
CrossTech: Added XML format parameter to CrossRef's OpenURL resolver
Matthias
I just installed Zotero and like it very much! But I wondered whether there is a tool like the Citavi pdf picker? A tool to "extract" the information out of pdfs (author, titel) and transfer it into Zotero and reference lists?
Kind regards,
jonny
You can just drag-drop pdfs into Zotero, then select however many you want, right-click, and select retrieve metadata. This scans the PDF for DOI/ISBN and retrieves metadata based on that. If nothing is found, then it tries to figure out metadata based on PDF contents and Google Scholar search results (the accuracy is fairly good, but not 100%).
Note that if you have author, title info in the PDF metadata or in the file name, those will not be read.