Search Google Scholar for PDFs

As a MacOS user, I've used both Bookends and Sente, and I found both software to contain a fantastic feature. I have a collection of over 500 PDFs with no metadata. When I imported those PDFs into Bookends and Sente, no metadata was found, and, therefore, neither software could automatically retrieve the bibliographic information. However, if such was the case, both software allowed me to search for the name or the author of the article on Google Scholar and provided a list of results. When I clicked the desired result, Bookends or Sente would retrieve the bibliographic information from Google Scholar. I haven't found a similar feature in Zotero so far. Is there a comparable feature in Zotero, or are there plans to release one?
  • Zotero will try to use a text extract and google scholar to retrieve metadata (and, in the trunk, will try to use DOIs).

    I'd imagine that you'd have to either manually enter or copy/paste the author/title into Bookends/Sente (because, as you said, it didn't retrieve any metadata)? Is this any different from entering the same information into your Firefox searchbox & to use Zotero's google scholar translator? The only difference I can think of is that the PDF wouldn't be auto-associated with the new reference you add.
  • edited March 26, 2009
    Bookends/Sente is able to retrieve PDF metadata and auto-complete the bibliographic references, just like Zotero does. However, there are some PDF files which do not contain any metadata.

    When I drag a PDF file to Bookends, it searches for any metadata the file may contain. If the PDF contains metadata, Bookends automatically retrieves all the bibliographic information. However, if the PDF itself does not contain any metadata, Bookends opens a dialog box and I can search for the name of the article on Google Scholar. Then, Bookends auto-completes the bibliographic information from the article from information retrieved from Google Scholar, even if the PDF does not contain any metadata. Does Zotero do the same?

    Look at the Sente feature in this video: http://www.thirdstreetsoftware.com/site/videos/ImportPDFs.html
  • Sure, Zotero does the same thing. The model is slightly different, as Zotero is a plugin to a browser (rather than stand-alone programs that have limited/embedded web browsers, a'la Sente/Bookends).

    Rather than a dialog box opening up, you'd have to rely on the Firefox search/location bars that are always present.

    A google scholar search would look like:
    http://scholar.google.com/scholar?q=Protein Measurement with the Folin Phenol Reagent

    Click on the Zotero folder icon in the location bar & select the reference whos information you want to be retrieved. You'll have to manually associate the PDF with that item by dragging it to the item.

    If I'm not understanding your query, let me know...
  • I suppose standalone attachments could have a context menu option when you're viewing a translatable page to convert the attachment to a child attachment of a new item based on that page, though that's a little awkward (and hard to phrase).
  • I'd love an awkward context menu or dialog such as the one you describe, Dan, and have it on my short list of things to put on the feature request forum. Both Mekentosj Papers and Mendeley currently do this (though they don't have the same parent/child structure as Z) for pulling metadata from the web and layering it on top of a PDF that they can't import automatically. Mendeley allows you to do it by DOI, PMID, or arXiv ID. The "match" function in Papers goes one better--it opens the PDF and allows you to select text and identify selected text as, for instance, author, year, journal, whatever, and then it constructs a PubMed search from those criteria and pulls down the metadata. It's hard to explain but quite elegant in practice. It's ultimately a lot easier than dropping a PDF into Zotero and doing a manual search for the metadata online, capturing the metadata and then dropping the PDF into the item, especially if you have a bunch of old PDFs to deal with.
  • I haven't tried the features in the other apps you mention, but I don't think a dedicated dialog would be the right solution for Zotero. I suspect you can type any of the identifiers you mention—plus others that the other apps don't support—into Firefox's Google search box and get the same result as the top hit.

    So perhaps a "Create Parent Item from Current Page" context menu option? One question is whether it would appear on non-translatable pages as well. We've tried to keep the distinction between address bar saving (which uses a translator to save full metadata) and Create New Item from Current Page (which uses basic metadata available on all webpages) clear, but that might matter less for this.
  • edited March 26, 2009
    I've just tested a sample workflow for the journals Stroke and JBC. With stroke, googling the DOI takes you straight to the PDF version of the paper, no metadata for Zotero to scrape. With JBC, googling DOI takes you to the journal's full text html page, with metadata for Zotero to scrape, but of course, hitting the icon in the address bar pulls down another copy of the PDF. So for both of these cases, its just as good to just do a new search and re-import the whole item. But if you have a stack of old PDFs that you want to add metadata to, the Mendeley or Papers functions are great. It's just fewer clicks and less fuss.

    Perhaps something in the "retrieve metadata" function that would allow specifying DOI or PMID for failed metadata retrievals? This seems like the best place for such a dialog. I should add that there should also be an option to confirm or reject metadata retrieval, as it is only correct roughly 2/3 of the time in my experience.

    These are mainly issues for peoples' personal archives--most new searches resolve themselves. Maybe someone needs to write a "get all your old crap into Zotero" plugin.
  • I've implemented a feature on the trunk to automatically extract DOIs from PDFs and search based on these DOIs, which should improve both accuracy and performance. We might also consider adding a manual DOI lookup feature. If you're seeing inaccurate metadata retrieval, it would be great if you could provide links or correct citations for the PDFs that are inaccurately recognized so that we can work to fix the issues.
  • Reviews seem to be a problem,
    this (i.e. the pdf behind it)
    http://www.jstor.org/stable/2503839
    which should have been this:

    Geddes, Barbara. 1995. “Review: The Politics of Economic Liberalization.” Latin American Research Review 30(2): 195-214.

    became this
    Pereira, L. C. B., J. M. Maravall, and A. Przeworski. 1993. Economic reforms in new democracies: A social-democratic approach. Cambridge University Press.

    (obviously getting it from JSTOR in the first place gets it right, but my sense is that a frequent use of pdf retrieval is importing old pdfs - which is what I did with this one.)
  • edited March 26, 2009
    If I download Zotero and reinstall, will the DOI function be active? I'll test it out with a stack of PDFs and keep track of any that don't work. The DOI extraction might fix a lot of the problems. Thanks for being so responsive! I am a big Zotero fan and it's almost to the point where I can convince my labmates to all jump in together...as soon as the shared library/groups features are enabled.
  • no, not yet. That's in the trunk, i.e. the version that's currently being developed.
  • No. I just can't figure out how to do that in Zotero...
    I'll give you an example. I have, in my computer, a PDF file of an article called "Reforming Brazilian Insolvency Law". There's no metadata in the PDF. If I drag this file to Bookends, Bookends recognizes that there is no PDF metadata in the file. Then, it automatically opens a dialog box and I can type the name of the article. Bookends searches Google Scholar and then it automatically finds all the data to complete the citation. This is very useful. However, I cannot do this in Zotero. Zotero just doesn't find the PDF metadata (the metadata doesn't exist) and what then? Will I have to manually complete the bibliographic information with information I retrieve from the Internet? This is very inconvenient...
  • skaertus: I'm not quite sure what the "that" is that you say you can't figure out how to do, but you certainly don't have to enter information manually. Go to Google Scholar, type "Reforming Brazilian Insolvency Law" in the search box, click Zotero's yellow folder icon in the Firefox address bar, select "[CITATION] Reforming Brazilian Insolvency Law", and click OK. A bibliographic item for that paper will be created in your Zotero library. Drag the PDF onto that.

    See the rest of this thread for other ideas on improvements to the process.

    Also, if you haven't read it yet, I'd recommend the Quick Start Guide.
  • skaetus for your general knowlege Zotero 2 is the big new supprise to be released.
  • Thanks! Now I see it. It's a different approach, but I figured out how to retrieve the citations!
Sign In or Register to comment.