extract references from PDF and create new library items from them
Just like in Mendeley, I would like to be able to do this in Zotero.
Input: PDF-file containing a "References" section, listing a number of cited papers mentioning Author1, Author 2, Journal, Volume, Year for each cited paper.
Operation: Zotero extracts list of cited papers from PDF, looks up metadata of each cited paper using Google Scholar or Pubmed and adds these as new library items in the users "My Library" folder.
Output: New subcollection of library items containing all papers cited in PDF that was used for input.
SuperDuperPlus: Zotero and its entire development team would be worth their weight in pure GOLD if Zotero could also automatically retrieve the PDF's for all (or most of) those library items and attach them to each item (for instance when one has institutional access to many journals through a University Library).
Argument: This is a highly frequent situation in which one starts with a review paper to get familiar with a new research topic and then proceeds with reading papers that were cited in the review paper. When done manually, this is a lot of work, which is surely worth automating.
Instead of Zotero just being a great archiving and management tool, it would become a sort of a search tool in the body of literature, in which one can follow a trail of "related papers", because the related papers are being cited by each.
Input: PDF-file containing a "References" section, listing a number of cited papers mentioning Author1, Author 2, Journal, Volume, Year for each cited paper.
Operation: Zotero extracts list of cited papers from PDF, looks up metadata of each cited paper using Google Scholar or Pubmed and adds these as new library items in the users "My Library" folder.
Output: New subcollection of library items containing all papers cited in PDF that was used for input.
SuperDuperPlus: Zotero and its entire development team would be worth their weight in pure GOLD if Zotero could also automatically retrieve the PDF's for all (or most of) those library items and attach them to each item (for instance when one has institutional access to many journals through a University Library).
Argument: This is a highly frequent situation in which one starts with a review paper to get familiar with a new research topic and then proceeds with reading papers that were cited in the review paper. When done manually, this is a lot of work, which is surely worth automating.
Instead of Zotero just being a great archiving and management tool, it would become a sort of a search tool in the body of literature, in which one can follow a trail of "related papers", because the related papers are being cited by each.
There is a kb article on importing formatted bibliographies, which is essentially what you're asking:
http://www.zotero.org/support/kb/importing_formatted_bibliographies
http://www.mendeley.com/bibliography-maker-database-generator/
http://www.zotero.org/support/retrieve_pdf_metadata
It is pretty smooth at looking up metadata by just dropping a PDF in there, but Zotero can do the same, eventhough it requires one more click.
I still love Zotero though. Keep up the great work.
Often I know Google Scholar will not have any data on a PDF (for instance because I made it myself), but I do want it in my library for future reference.
http://feedback.mendeley.com/forums/4941-mendeley-feedback/suggestions/834313-version-0-9-7-does-not-extract-references-from-the
"Hello - This feature was removed in 0.9.7 because it was consuming a fair amount of resources (client and server side) without providing enough value. We plan to re-introduce it in an improved form in future."
Switching to Zotero for now :)
There's https://anystyle.io/ which works pretty well to get bibliographic data from pasted bibliographies.
I've tried exporting for Endnote desktop, which does create an RIS file, but neither the file nor a copy from TextEdit / Import from Clipboard imports to Zotero (Mac Standalone) - in both cases I get "the selected file is not in a supported format".
Example from TextEdit:
ID - catau51249007930001751
AU - Krasner, David, 1952-
A2 - Saltz, David Z., 1962-
A2 - University of Michigan. Press
Y1 - 2006
KW - Performing arts -- Philosophy
KW - Performing arts -- Social aspects
PB - Ann Arbor : University of Michigan Press
CY - Ann Arbor
TY - BOOK
T1 - Staging philosophy intersections of theater, performance, and philosophy
ER -
Thanks for your assistance.
TY - BOOK
ID - catau51249007930001751
AU - Krasner, David, 1952-
A2 - Saltz, David Z., 1962-
A2 - University of Michigan. Press
Y1 - 2006
KW - Performing arts -- Philosophy
KW - Performing arts -- Social aspects
PB - Ann Arbor : University of Michigan Press
CY - Ann Arbor
T1 - Staging philosophy intersections of theater, performance, and philosophy
ER -
will import reasonably well (not as well as the items import directly from the catalog, where we do all kinds of clean-up, so you may want to consider if that's not the faster route after all).
You can easily manually do this in text edit or think if there's a good search&replace you can run: unfortunately the latter is not going to be easy unless all of the records are books (in which case you'd first remove all TY - lines and then replace ID - by TY - BOOK followed by a newline)
Thanks for your suggestions though - what I can do is export all the books in one batch and all the articles in another, then do a search+replace (though not sure how to opt to replace the whole ID *line* with "TY - BOOK", in TextEdit / Word / etc - any hints? Otherwise I can just do it manually). Though, just FYI, in the process of investigating this I discovered a strange thing - if I replace the ID line with TY - JOUR, then it recognises the DOI as a separate article, i.e. it imports the article I want (sans DOI) and then a separate article with *just* Item Type: Journal Article and the DOI.
eg:
DO - 10.2307/464730
ID - TN_jstor_archive10.2307/464730
AU - Bahti, Timothy
Y1 - 1981
JF - Diacritics
VL - 11
IS - 2
SP - 68
EP - 82
SN - 03007162
TY - JOUR
T1 - The Indifferent Reader: The Performance of Hegel's Introduction to the Phenomenology
ER -
With replacement:
DO - 10.2307/464730
TY - JOUR
AU - Bahti, Timothy
Y1 - 1981
JF - Diacritics
VL - 11
IS - 2
SP - 68
EP - 82
SN - 03007162
T1 - The Indifferent Reader: The Performance of Hegel's Introduction to the Phenomenology
ER -
- ISO-8859-1
- UTF-8 [the default which I used for the above]
- US-ASCII
- windows 1251
There's just no reason we would risk breaking valid imports by supporting this. Contact the library and tell them to fix their output.
Thanks all for the help!
More info here: https://www.scholarcy.com/bookmarklets
Demonstration video here: https://youtu.be/b8zPk364SZM
Please give it a try and let me know what you think.