Metadata Retrieval

gtrsmith · November 4, 2009

I've just been told that zotero can only search for metadata through google scholar. Is there any way that you can add a feature that will allow you to search multiple databases like Web of Science for the metadata, and not just google scholar? I think that would help a lot of researchers! Thanks for the consideration!

amacom73 · November 5, 2009

This is not exactly correct: Zotero uses only Google Scholar for PDF metadata extraction, but it can search for metadata using ISBN, DOI, or PMID using the magic wand button. This doesn't help when you're doing a batch extract from PDFs, but for individual items, it's not a bad way to populate the metadata and drop the PDF inside.

I should add that I am strongly in favor of having the ability to do PDF metadata extraction from pubmed or elsewhere, but I do understand the impetus of the dev team to use the single most versatile database for metadata extraction. My zotero wishlist includes a setting to allow user choice of the default metadata extraction database (which would be PubMed for my purposes).

noksagt · November 5, 2009

One stage of the metadata retrieval uses an excerpt of text & relies on google scholar's fulltext indexing of many academic articles (which WoS lacks) to find a match. IIRC, if this fails & there is an identifier (e.g., a DOI), this will be used to cull reference information.

There is no easy way for Zotero to determine partial metadata from most PDFs, which would be needed for WoS support.

dstillman · November 5, 2009

IIRC, if this fails & there is an identifier (e.g., a DOI), this will be used to cull reference information.

It looks for a DOI before falling back to text searching with Google Scholar, but yeah.

victorgan · February 12, 2010

One issue I've encountered is Google Scholar's poor indexing of author names: often they get the last name wrong, using one of the first name, which would seriously impair searching. For science articles PubMed is reliable, and has more fields indexed as well, so it would be great if there was a one button sort of update via secondary database using DOI.

anders_royce · February 14, 2010

I think this may be the reason Google Scholar just locked me out. I was trying to get metadata for a few dozen PDFs. After about half were finished, Google stopped giving me anything. Now when I try to navigate to the Scholar page, I get:
"Google
Sorry...
We're sorry...

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
See Google Help for more information."

The Google help article that is linked from that page says automated queries are against Google's terms of service, so are we breaking the rules? Are they going to cut Zotero off completely once they figure out what is going on? Should Zotero find another way to get PDF metadata? I say yes, not because I have any problem with Google Scholar, but because we seem to be violating their terms.

ajlyon · February 14, 2010

The Google help article that is linked from that page says automated queries are against Google's terms of service, so are we breaking the rules

Zotero only sends queries when users ask for specific searches, usually only a few PDFs at a time. It is not clear that such usage is an "automated query".

adamsmith · February 14, 2010

but it might still be a good idea to check back with them on this?
Remember google does try to not be evil, Zotero is in no way a competitor and so I think there is a pretty decent chance that they would be willing to tweak their "lock-down" algorithm to play a little nicer with Zotero users.

ajlyon · February 14, 2010

but it might still be a good idea to check back with them on this?

I hope you're right. This sounds like a job for something akin to Mozilla's evangelism team. Is there someone visible in the Zotero world (Dan? Bruce?) who might send a message to the right people?

adamsmith · February 14, 2010

Zotero does have two official directors - and at least Sean (Takats, username sean) is quite active on the forum, too, so that would seem ideal. Otherwise Dan Stillman - Bruce is actually not directly affiliated with Zotero.

dancohen · February 14, 2010

We have had many contacts with Googlers over the years, although mostly in the Google Books project rather than Google Scholar. Without getting into details, it's extremely hard to get Google to respond to requests for leniency like this. I don't think they are worried about competition; it's just a matter of getting a response from very busy people who then have to task an engineer with figuring out what Zotero queries look like (and whether they are distinguishable from other automated queries). Remember, they can't just white-list a single IP address (i.e., the Zotero server), since these rapid requests come from decentralized Zotero clients.

adamsmith · February 14, 2010

thanks dan -that was quick - I'll gladly correct myself and say both directors are quite active on the forum ;-).
That problem occurred to me, too, after I had written that.

dancohen · February 14, 2010

adam: no, you're quite right that I'm not quite active on the forum. but I do keep tabs on what's going on here.

ajlyon · February 14, 2010

Perhaps not surprisingly, I had merged Dan Cohen and Dan Stillman in my mind into a single Dan Zotero. Despite running into both surnames, I certainly never thought of them as distinct people. Maybe Zotero needs more staff photos?

radu124 · April 16, 2010

suggestion.

I also ran into trouble when retrieving data from google scholar. Not only zotero stopped working, but I was unable to use scholar for a while.

google ublocks the service if you enter a captcha, but it seems if you had too many invalid accesses you don't even get the captcha anymore (at least until you delete the cookies). Could we at least make zotero determine when it gets blocked and not send even more requests (and maybe send the user to the captcha page).

anders_royce · April 16, 2010

According to the last line of Dan Stillman's post here, Zotero Forums - Locked out of Google Scholar, that may not be possible.