Auto-import of references from an article


Here is my dream: I am reading a paper online, somewhere, and I see a reference in that paper I would like in my library. I then just drag that tagged element onto a collection in zotero, and zotero adds it.

That's not happening, of course, but zotero seems to be the closest on the PC to it. Here's my question: is there any site (such as sciencedirect) where zotero knows how to recognize the cited references in a paper? That is, where a yellow "folder" appears in the address bar above when you're reading a paper, so that you can automatically import some of the references from that paper into zotero?

Would it be cool if there was some sort of standard, and perhaps an activex control, or some plugin, which would allow dragging-and-dropping of these references?

  • edited December 19, 2008
    The difficulty here is that, in most cases, there is no structured metadata for those references. In many cases the articles you are looking at just have plain text for those citations, if they are in pdf's it could be even less than that. See this entry for info about plain text.

    Now, if people start embedding metadata inside those citations, say using COinS, then Zotero would automatically generate a folder icon and list the items on that page in the order they appear.
  • Right - that's what I was thinking of when I say a "standard". I'm not sure how that's handled these days in html - perhaps COinS, as you mention. That would be phenomenal.

    Re: the plain text issue, that is true, but it's also true for things such as google scholar, right? Yet, zotero has site-parsers which pick apart the html/plain text in google scholar and web of science and a bunch of other sites based on knowledge of how each site displays the references. I don't know why such a solution, while not as nice as a standards-metadata-based solution, couldn't work for the bibliography in a particular database such as sciencedirect as well. The only difference would be parsing a list of references resulting from a search, versus parsing a list of references at the end of an article in the bibliography. (I am assuming that sciencedirect, for example, displays references for all the articles it contains in a standard way. Please let me know if that assumption is wrong).

  • Re: the plain text issue, that is true, but it's also true for things such as google scholar, right?
    Not really. Most translators extract data in preexisting formats such as RIS and pass that to the core import translators. Some translators do screen scraping of the HTML, but HTML is still a much more structured format (with table cells, divs, spans...) than most plain text, even plain text in a consistent format.

    Someone could always try writing a translator for the ScienceDirect references page, but it'd make more sense for them to add COinS to those references.
  • As a work around, you could try the "Universal reference formatter" ( though I'm not sure how well it works.
  • makes sense - html is more structured and easier to parse than plain text. we need to start a petition to sciencedirect or something, it sounds like.

    if you click on "references" for a paper in web of science, it comes up with a list which is more structured than the bibs in sciencedirect. That might be a better opportunity to parse bibliographies.

    COinS, i'm sure, would be much better. But that's not exactly under our control, unfortunately.

    So, am I to take it then that no sites have bibliographies of papers parsable by zotero?

  • Not entirely true. You can do it with Wikipedia. :)

    The reason being that it uses a format called Wikipedia Citation Templates, rather than straight-up plain text or HTML, that makes it easy to ingest the information. Each piece of information is stored in a separate field, so no semantic parsing is necessary. If you look in the article source toward the end you can see what I mean.

    That's not particularly useful for serious scholarly study, of course.
  • Slight correction: Wikipedia Citation Templates (which Zotero can generate) are used in the wiki markup—on the HTML side, it's COinS (as you can see by hovering over the folder icon in the address bar).
  • That's not particularly useful for serious scholarly study, of course.
    Although the article itself may not be able to be used in a serious scholarly study. The list of references provided can some times be valuable if you are attempting to collate a list of 'legitimate' references.

    For example, if you search for 'rare species' you get a short bibliography. The article listed may be of interest and a translator valuable in acquiring it in your collection.
  • There is another web-based reference manager called wizfolio that does this. It can copy references from even a pdf file into clip board and it will automatically add it to the database. I wonder if that can be incorporated here into zotero.
  • huh. interesting. How well does this work in practice?
  • edited December 19, 2008
    That's essentially what cb2Bib, mentioned on the Importing Formatted Bibliographies page that Trevor linked to above, does. Direct support for something like that may be incorporated into Zotero eventually. (Zotero 1.5 can auto-recognize certain PDFs, but that's a separate feature.)
  • Wizfolio's import from clipboard function is different from cb2bib though in that it performs a lookup of the references in Pubmed rather than trying to parse the formatted bibliography itself. At least that's what I gather from this page.
  • By any chance, has there been any progress on this? This option would save a lot of time.

  • nope - best idea to use a third party tool like the ones mentioned here and import to Zotero.
Sign In or Register to comment.