[MLZ] MLZ not picking up citation from Google scholar case law

A temporary problem?
  • Yep, sorry about that - I hadn't pull through a code update from Zotero main. If you update translators it should be fixed now - thanks for calling attention to the fault.
  • Yes, it's back!
  • I think MLZ's translator is off again with Google Scholar case law.
  • Thanks for reporting the fault. I've merged some recent changes to the translator. Can you post a URL that fails, so I can test?
  • (that one's broken with vanilla Zotero, too, FWIW)
  • MLZ translator distribution is broken. I'll get that going, then look into the failure. Zotero recently accepted a change that allows us to use the same translator code in MLZ and official Zotero, so when the breakage on this item is fixed, it will work for both.

    More soon ...
  • edited May 10, 2015
    I've taken a look. There is good news and bad news.

    First the bad news. Google Scholar still provides no structured metadata on case pages: what you see on the screen is all the translator has to work with. Unfortunately, there isn't much structure in there, and I think this item shows that we've reached the limit of what can be done.

    So if working from Google Scholar, you'll need to enter the details of this and similarly formatted items by hand.

    To extend that gloomy picture a little, there are similar issues with WestLaw, Lexis, and Bloomberg Law (US), and with BaiLII (UK). For the most part this is by intention. For the commercial services, it is part of a lock-in strategy. For Google Scholar (which, to be fair, has done a lot to open up access to US court judgments), the aim is presumably to drive traffic to their site. BaiLII is a special case: under their arrangements with UK courts, they are at pains to emphasize that their reports are only an unofficial record, and their policy is therefore to do nothing that would support third-party referencing tools like Zotero.

    This is all very annoying. As I wrote of our efforts to screen-scrape legal case citations from bare text (several years ago, in another context):
    It is a testament to human ingenuity that this is possible at all, but the underlying infrastructure is an embarrassing bundle of wet string.

    The good news is that services that do provide structured metadata are beginning to emerge. The two that I am familiar with are CourtListener and FastCase.

    I initially wrote that the linked case is not available on CourtListener, but I just missed it. A search for "POM Wonderful 2015" turns it up here. In logged-in state, the CourtListener translator in MLZ pulls perfect metadata for it, complete with jurisdiction and court codes.

    API: 1; Screen-scraping: 0.

    CourtListener is a free-access project driven by contributions and grant funds. The service has quite broad coverage, and the team aim to cover all of US law. CourtListener offers an excellent API that is accessible with a (free) account on the service. The MLZ translator for CourtListener relies on the API, and it produces clean metadata. The current limitations are: some holes in coverage (the case linked above is not yet in the service); and a lack of official citations (to West reporters etc) for cases below the US Supreme Court level. The team are making progress on both of those issues: CL is a service worth checking out, and definitely one to watch.

    FastCase is bundled with bar association membership in many states, and subscriptions to the service are available at much lower cost than the other commercial services. FC offers an API that might be useful for building more reliable translators. Maybe—you would have to check. I built a translator for FastCase under trial access some time back, and ended up doing screen-scraping. My access expired, and plans to subscribe to the service locally have not gone anywhere, so I'm not sure whether the translator I built still works, or whether there might now be a better approach to the site available. But as the existence of the API shows, FastCase is a modern service that "gets" the importance of inter-operability. They too are worth exploring.
  • Well, it's is definitely good news that at least there are alternatives to Google Scholar that provide structured metadata for court cases.

    It's a shame that Google Scholar has stopped to provide metadata. It's odd that Google has chosen to do so to DRIVE traffic to their site! I've been visiting their site exactly because they are compatible with MLZ translator, unlike WestLaw or LexisNexis. Google's new policy is actually driving me AWAY from their site and towards their alternative.
  • It's not a new policy: we have always screen-scraped cases from GS. I may be too uncharitable: it might also be a resource issue at their end.
  • I didn't know it had always been screen-scrapping from GS. My experiences have been so good with the GS translator that it never occurred to me it was screen-scrapping!

    Is there a chance that the Google Scholar developers are not aware of how legal cases are cited? My sense is that Google would be willing to build structured metadata for legal cases, if they realize there is need from legal scholars. Maybe we should alert them to the issue.
Sign In or Register to comment.