Google Patents not working

aizvorski · July 16, 2011

Google Patents is not working ("Could not save item"), on any patent. Example:
http://www.google.com/patents/about?id=j5NSAAAAEBAJ

Zotero 2.1.8, Firefox 4.0.1, Windows 7, all translators updated just now. There is nothing about this in the known translator issues page either, so I'm posting here.

ajlyon · July 16, 2011

Please go to http://github.com/ajlyon/translators/raw/master/Google Patents.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.

aizvorski · July 16, 2011

Thanks - I tried it, it works! I will test more later today, but so far it works on everything incl patents and patent applications. That was really fast :)

ajlyon · July 16, 2011

I went ahead and committed this to the repository, since I'm pretty sure it'll work for everybody: https://github.com/zotero/translators/commit/ec5e41f27ce5c280e3d3255ad0a4ab3f41c4803e

Still, post here if any issues come up as you keep using the translator.

tap · September 7, 2011

I still get the Google Patents is not working ("Could not save item") error (ReportID: 1345975352), on any patent (I have tried). Example:
http://www.google.com/patents/about?id=KchEAAAAEBAJ

With both the newest stable version of Zotero as well as the beta
Zotero 3.0b2 (de/us), Firefox 6.0.1 (de/us), Windows 7 (de), all translators updated. Also tried the above.

ajlyon · September 7, 2011

Please go to http://github.com/ajlyon/translators/raw/master/Google Patents.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

This addresses some saving issues, although I'm not 100% sure it will fix your issue. If this works for you, please post here so that I can submit this change to be pushed to all users.

If it doesn't work, please provide a new report ID.

bunder · October 16, 2011

The "Google Patents" translator doesn't extract correctly inventors from patents with several inventors and doesn't extract abstract. Example: http://www.google.fr/patents?id=Nh17AAAAEBAJ
I think also that this translator should set country to "U.S." automatically as all patents in Google database are U.S. patents (see http://www.google.fr/googlepatents/about.html)

ajlyon · October 16, 2011

Ok. Please go to http://github.com/ajlyon/translators/raw/master/Google Patents.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.

bunder · October 16, 2011

Yes, it works like a charm! Thank you very much, Avram.

Darkwraith · February 9, 2012

I am having this problem as well on this website:
http://www.google.com/patents?hl=en&lr=&vid=USPAT5139901&id=mHsiAAAAEBAJ&oi=fnd&dq=NH3+lithium+battery&printsec=abstract#v=onepage&q=NH3%20lithium%20battery&f=false

I tried installing the translator file recommended on Oct. 16th to no use.

I am using chrome 16, Zotero standalone 3.0.1, and my report ID is 1839355118

ben58 · February 21, 2012

It works sometimes, mostly not. Which is a medium annoyance for my work, as I work with patents rather often.

Now I started to play around with translators (hoping to write ones for EU and German patens), and looked first into the Google Patents translator.

It appears, the/a problem is: the function detectWeb() looks whether substring "id=" is present in the URL, then return the document type "patent"

In most cases the "id=" is abscent: Google just changed something.

However, on the "Overview" type of patent view, in most (see below for exception) cases I tested the substring "ei=" is present.

I have replaced "id=" by "ei=" in the translator (file "Google Patents.js", line 23), and now both functions (detectWeb and doWeb) appear to work properly from Zotero Scaffold: the first one returns type "patent", the second one parses the page OK.

However, trying to save the data from the Firefox page still results in "Could Not Save Item" error - no idea why. The translator actually used is "Google Patents.js" (hovering the mouse pointer over the "Save to Zotero" icon)

There is another small issue. The URL may be in the form http://www.google.com/patents/US4390992?dq=plasma+Dean+Judd&ei=r6JDT4iDD8fYsgbSmN3qBA
or in the short form
http://www.google.com/patents/US4390992

The data scraping by doWeb looks OK (see below) in both cases, thus detectWeb has to be adjusted.

I could do that and more testing, if somebody explains me, how to find out why I still get the error message (as it apparently works from Scaffold), and probably helps me otherwise.

-------
From Scaffold:

Returned item:
{
"itemType": "patent",
"creators": [
{
"firstName": "William P.",
"lastName": "Arnold",
"creatorType": "inventor"
}
],
"notes": [],
"tags": [],
"seeAlso": [],
"attachments": [],
"abstractNote": "A rapid, semi-automated method for determining dibucaine numbers is disclosed wherein use is made of a unit dosage form of dibucaine or a test pack containing dibucaine in a unit dosage form.",
"country": "United States",
"extra": "U.S. Classification: 435/20\nInternational Classification: : C12Q 146",
"patentNumber": "4340667",
"date": "Jul 20, 1982",
"filingDate": "May 20, 1980",
"assignee": "The University of Virginia",
"title": "Rapid, semi-automated method for determining dibucaine numbers",
"url": "http://www.google.com/patents/US4340667?dq=plasma+oher&ei=raRDT7a3JsXftAa9_JnnBA",
"libraryCatalog": "Google Patents",
"accessDate": "CURRENT_TIMESTAMP"
}

adamsmith · February 21, 2012

It's odd - Zotero's translator doesn't find the xpath of the title
//h1[@class="gb-volume-title"]
though that's clearly correct - which is why it works in Scaffold. I'll have a closer look later.
Since Scaffold doesn't work properly on FF10 (at least under Ubuntu), I'm currently not doing a lot on translators - it's just very cumbersome. I hope we'll have that sorted out soon.

adamsmith · February 21, 2012

the way to check for this, btw. is to look in your error report after getting a translator error.

adamsmith · February 22, 2012

temporarily continued here as this is getting rather technical:
http://groups.google.com/group/zotero-dev/browse_frm/thread/f259f921f95ea7fd

adamsmith · February 23, 2012

OK, I think I've got this, but since I made some substantial changes I'd like some testing before we push this out.
Download the translator from here:
https://github.com/adam3smith/translators/raw/googlepatents/Google%20Patents.js
and place it in the translator folder in your zotero data directory:
http://www.zotero.org/support/zotero_data , replacing the file of the same name.
This should work from all google patent views. It should also (re-?)establish pdf download for patents.
Please let me know how it goes.

dragoshenron · February 23, 2012

Works for me :)

ben58 · February 23, 2012

It works now much better ;)

Well, from the search results (patent list) it looks like all data are read in OK.

From the "overview" patent view - also.

But from "patent page" view like one below - the mileage varies. Sometimes it works, but in many cases only limited data are saved, e.g. no inventor names, no abstract, no assignee.

If I understand it correctly, for multiple items list, the script fetches the data not directly from the page, but from other URLs.

Would it make sense to apply the same technique for "patent page" views: get the "overview" page, then scrape the data?

BTW I'm not sure why I got the pages 7-8 of the patent in the link below - I think, I just clicked onto the search result.

http://www.google.com/patents?id=jH4DAAAAEBAJ&pg=PA9&dq=weltmann+plasma&hl=en&sa=X&ei=J4RGT4bWIaPS0QXwuPSKDg&ved=0CD0Q6AEwAw#v=onepage&q=weltmann%20plasma&f=false

adamsmith · February 23, 2012

thanks for that link -

If I understand it correctly, for multiple items list, the script fetches the data not directly from the page, but from other URLs.
Would it make sense to apply the same technique for "patent page" views: get the "overview" page, then scrape the data?

for multiple item lists, the translator actually just uses the overview link that's displayed below the item.

It is already loading the overview page for many abstract or pageview items - see. e.g. Darkwraith's link above - but to avoid doing that unnecessarily (i.e. when people are already on the overview page) the translator is testing for the presence of "printsec". That's apparently too restrictive - I'll need to figure out what makes your link a pageview link - the "onepage" may work.

edit: ah yes, I see - testing for either "onepage" or "thumbnail" should work.

adamsmith · February 23, 2012

please try again - same link as above.

ben58 · February 23, 2012

now even better, but on attempt of multiple import of all items from following link
http://www.google.com/search?tbm=pts&q=plasma+miller&btnG=

- "could not save" error

An on one of the patents found, namely
http://www.google.com/patents?id=y-__AQAAEBAJ&printsec=frontcover&dq=plasma+miller&hl=en&sa=X&ei=ZbNGT9PsEsyK4gTk1dnWDg&ved=0CE8Q6AEwCQ
also the same error.

On doWeb from Scaffold on this page -
fileName => chrome://zotero/content/xpcom/translation/translate.js
lineNumber => 550
string => Error: No title specified for item

adamsmith · February 23, 2012

thanks, that's very thorough testing - it's those two underscores in the id=y-__AQAAEBAJ
I'll think about whether we can just add those to the regex finding the IDs - seems like it should be OK. If you don't mind try a couple more and see if you can find anything else. Otherwise I'll add the underscore and that should be it.

ben58 · February 23, 2012

adamsmith, thank you very much for fixing the translator. I've done some more testing and it worked flawless on several dosens of patents.

ben58 · February 24, 2012

in the last version of translator, on line 124 there is still
...id=[a-zA-Z0-9]...
which might probably result in failure in some cases, if I understand it correctly

adamsmith · February 24, 2012

Thanks, I'll fix that when I get a chance, but no, that won't lead to failure, that just will put a truncated URL in the URL field.

adamsmith · February 25, 2012

OK, the fix (including for the last bit concerning the URL) is now up. If anyone else experiences any problems, there is no need to manually install the translator for anyone else - by now it should have gotten to all Zotero clients.

wonblee · December 3, 2012

Google patent translator is grabbing the patent as a "book" type, rather than as a patent.

I'm resurrecting an old thread rather than starting a new one, because it could be a transient problem this time too. Incidentally, I've recently installed MLZ, but my understanding is that MLZ relies on the same translators as Zotero.

adamsmith · December 3, 2012

confirmed - the translator is currently broken (Zotero imports generic metadata from the page). We'll have a fix out asap.

aurimas · December 3, 2012

@adamsmith, I'm looking into this.(practically rewriting the translator)

adamsmith · December 3, 2012

(thanks - it's a big mess, I had a quick look and then decided it'd take a bunch of time).

fbennett · December 3, 2012

@wonblee,

MLZ uses a separate channel for updating translators, that includes some variants of the standard ones, to provide support for legal and multilingual metadata where it is available. Changes to the standard translators are merged in from time to time though (about once a week), so the changes aurimas is working on will feed through to MLZ pretty soon after they are released.

wonblee · December 3, 2012

@fbennett

Thanks for the information. I can wait patiently as long as I know it's coming. :-)