Australasian Legal Information Institute

Grieth,

I've created a basic translator for AustLII (http://www.austlii.edu.au/) that you can build upon.

Step 1: Download and install Zotero's Scaffold Utility here:
http://dev.zotero.org/docs/scaffold

Step 2: Fill out the "Metadata" tab something like this:
http://mckinney.sw.googlepages.com/scaffold-metadata.png

Step 3: Cut and paste code below into the "Detect Code" tab like this:
http://mckinney.sw.googlepages.com/scaffold-detect-code.png

Step 4: Cut and past code below into the "Code" tab like this:
http://mckinney.sw.googlepages.com/scaffold-code.png

Step 5: Click the "Save to Database" icon on the toolbar (second from left).

Step 6: Test some AustLII cases.

Good luck!

--bill

============== translator code =================

function detectWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;

var austliiRegexp = /^http:\/\/www\.austlii\.edu\.au\/au\/cases\/.+/
if(austliiRegexp.test(url)) {
return "book";
} else {
var aTags = doc.getElementsByTagName("a");
for(var i=0; i<aTags.length; i++) {
if(austliiRegexp.test(aTags[i].href)) {
return "multiple";
}
}
}
}
function scrape(doc) {

var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;

var courtArray = new Array();
courtArray ['FCAFC'] = 'Federal Court of Australia - Full Court (FCAFC)';
courtArray ['FCA'] = 'Federal Court of Australia (FCA)';
courtArray ['FamCA'] = 'Family Court of Australia (FamCA)';
courtArray ['FMCA'] = 'Federal Magistrates Court of Australia (FMCA)';

var newItem = new Zotero.Item("case");
newItem.title = doc.title;
newItem.url = doc.location.href;

var titleRegexp = /^(.+)\s+\[(\d+)\]\s+(\w+)\s(\d+)\s+\((\d+)\s+(\w+)\s+(\d+)\)/
var titleMatch = titleRegexp .exec(doc.title);
if (titleMatch ) {
newItem.caseName = titleMatch[1] + " [" + titleMatch[2] + "] " + titleMatch[3] + " " + titleMatch[4];
newItem.dateDecided = titleMatch[7] + " " + titleMatch[6] + " " + titleMatch[5];

if (courtArray[titleMatch[3]]) {
newItem.court = courtArray[titleMatch[3]];
} else {
newItem.court = titleMatch[3];
}


} else {

newItem.caseName = doc.title;
newItem.dateDecided = "not found";
}

newItem.complete();
}

function doWeb(doc, url) {
var austliiRegexp = /^http:\/\/www\.austlii\.edu\.au\/au\/cases\/.+/
if(austliiRegexp.test(url)) {
scrape(doc);
} else {

var items = Zotero.Utilities.getItemArray(doc, doc, austliiRegexp);
items = Zotero.selectItems(items);

if(!items) {
return true;
}

var urls = new Array();
for(var i in items) {
urls.push(i);
}

Zotero.Utilities.processDocuments(urls, scrape, function() { Zotero.done(); });
Zotero.wait();
}
}
«1
  • Thanks Bill. this looks fantastic. I will start trying to get on top of it.
  • Bill,

    If you are satisfied with this translator, we would be pleased to roll it out into the general Zotero distribution. Please let us know whether it's ready.
  • Sean,

    Lets wait and see what Grieth has to say or add, seeing as he's an Aussie. I did some basic testing but nothing extensive. The AustLII resource he pointed us to is pretty amazing and a perfect candidate for Zotero. I might also want to try my hand at writing a translator for the legislative resources there.

    We'll be sure to give you a shout when its ready.

    --bill
  • Bill,

    Sounds good. Thanks for all of your hard work. It's terrific to have people in the community taking the initiative to expand Zotero's functionality.
  • Bill, Great work. I have found a small problem. When using the AustLII search engine, the resulting links are in this form:
    http://www.austlii.edu.au//cgi-bin/disp.pl/au/cases/cth/family_ct/2006/25.html?query=family%20law%20and%20kirby%20and%20relocation
    This seems to fool the translator - can it ignore the '?query' and everything following in the address?

    I also realised that a list of court names for abreviations would soon bloat the translator if every AustLII court were added - can the court name be scraped from the page.

    The translator should also work for lots of other LII's, eg
    www.nzlii.org/nz/cases
    www.bailii.org/??/cases
    www.saflii.org/??/cases
    www.paclii.org/??/cases
    www.hklii.org/??/cases
    (the ?? represents the two letter jurisdiction code which varies)

    Beyond this, further improvements would require the cases template to have fields for the information that lawyers use. In the Eng/Aust areas the judges name(s) are essential, but there is no field for this in Zotero's case template. Lots fo the info in the template isn't used by aust lawyers/legal academics. I think that Bill has already done a lot of great work on this (building his own variant of Zotero in this regard) - Maybe you can explore this with Bill.

    grieth

    PS Is there an idiots guide to writing this stuff? I have done a lot of macro programming in visual basic for Word, but am a lawyer, not computer programmer. and I can't get a handle on the coding. I think that I will need to if I want to help develop this to extract catchwords, lists of cases referred to, etc that all appear on these pages.
  • Grieth,

    1) I'll take a look at adding search link capability. It should be feasible.

    2) I agree with you about bloat for expanding court abbreviations. Ultimately they would be better located in a SQLite table. I'll try to see if page scraping might work as you suggest.

    3) Yes, for now the judge can only be added as a "contributor" I guess. Another possibility is to add it as a note. I really believe that each country should have its own case schema, but that is a lot to ask of the Zotero people right now. As an American lawyer, I have an opinion about what the case fields should be for the U.S., but I am clueless about other countries.

    Re: Idiots Guide - unfortunately, translator authoring is a bit tricky. Zotero's Scaffold tool is big step in the right direction. A good book on javascript and regular expressions will be the most help. O'Reilly is a good source for both of these topics - see: http://www.oreilly.com/

    Also, it is really unfortunate that websites like AustLII don't expend a little bit more effort adding some semantic information so that complex page scrapping wouldn't be required. You might suggest to them that they add some basic "META" tags to the HTML for things like judge, catchwords, etc. Cornell's LII does this to some degree and it makes the job so much easier.
  • thanks Bill.

    Looking forward to seeing the next version. I think judge's name as 'contributor' (whatever that is meant to mean) is better than in the notes section - while there is no 'judge' field. How does anyone develop a schema that has no field for the author (the judge who wrote the decision)?

    I will talk to AustLII about meta tags. I suspect it would be a project that would have to involve the courts as the cases are sent by email to an automated address for conversion to html and upload - but it will have to be done eventually. At present they are working through a funding problem, but that should be resolved over the next couple of months.

    Sean,
    Is Zotero prepared to look at the schema problems, especially that there is no field for the judge's name, given that Bill has already done a lot of work on that. I suspect that Bill's schema would be adequate for all common law jurisdictions, even though it is not perfectly tailored for each country - at least a lot better than the current one.

    Also, the tags field box is fairly short and when the tag exceeds the box length, it doesn't adjust to a two line box, you just can't see the end of the tag. Can this be fixed? (I use long tags (short heading connected with a '-' as a workaround for tags being at a single level or flat database).
  • Bill,

    I tried simply adding

    int postion = austliiRegexp.indexOf("?");
    If (position>1){
    austliiRegexp = austliiRegexp.substring(0,position);
    }

    at some apparently strategic points, but couldn't get the translator to work. Is it possible to simply reload the page after this expression is used to convert the address, or is that more problematic?
  • Grieth,

    I had some time to play around with this and tried to see if it was feasible to extract court and judge from the body of the page. It doesn't look good since my tests show that the tagging over the years varies (eg: court names sometime appear in the first h2 tag and other times they use h1 tags). I couldn't find any particular pattern to where to find the judge. This is the main problem with screen scraping and the achilles heal for Zotero in general.

    As a comparison, try viewing source on a Cornell LII page (eg: http://www.law.cornell.edu/supct/html/05-1575.ZS.html). They at least capture basic information in semantic tags:

    <meta name="CASENAME" CONTENT="SCHRIRO V. LANDRIGAN">
    <meta name="COURTBELOW" CONTENT="certiorari to the united states court of appeals for the ninth circuit">
    <meta name="ARGDATE" CONTENT="January 9, 2007">
    <meta name="DECDATE" CONTENT="May 14, 2007">
    <meta name="DOCKET" CONTENT="05-1575">
    <meta name="PARTY1" CONTENT="SCHRIRO">
    <meta name="PARTY2" CONTENT="LANDRIGAN">

    I think our best bet is to support basic Zotero import using the title information and also support the search result urls. Users can always enter missing info like the judge into the fields manually for now. I'll let you know when I get the search urls added. Your javascript attempt was in the right direction!

    --bill
  • I agree.
    The only pattern to finding the judge name is that is usually is to the right of certain words like Judge, judges or coram - but not always. I noticed that a lot of times this is in a table format as well. I think your right that the best bet is AustLII adding meta data.

    Getting something functional is a good start. If people start using it, AustLII may work on the meta data.

    Looking forward to the revised translator - I wish I could be more help there.
  • Grieth,

    I think I have something that should work minimally on both the AustLII and NZLII websites, including search URLs.

    1. Replace the target field on the "Metadat" tab of Scaffold with this:
    http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+

    2. Replace "Detect Code" tab contents with:

    function detectWeb(doc, url) {
    var namespace = doc.documentElement.namespaceURI;
    var nsResolver = namespace ? function(prefix) {
    if (prefix == 'x') return namespace; else return null;
    } : null;

    var austliiRegexp = /^http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+/
    if(austliiRegexp.test(url)) {
    return "book";
    } else {
    var aTags = doc.getElementsByTagName("a");
    for(var i=0; i<aTags.length; i++) {
    if(articleRegexp.test(aTags[i].href)) {
    return "multiple";
    }
    }
    }
    }

    3. Replace "Code" tab contents with:

    function scrape(doc) {

    var namespace = doc.documentElement.namespaceURI;
    var nsResolver = namespace ? function(prefix) {
    if (prefix == 'x') return namespace; else return null;
    } : null;

    var newItem = new Zotero.Item("case");
    newItem.title = doc.title;
    newItem.url = doc.location.href;

    var titleRegexp = /^(.+)\s+\[(\d+)\]\s+(\w+)\s(\d+)\s+\((\d+)\s+(\w+)\s+(\d+)\)/
    var titleMatch = titleRegexp .exec(doc.title);
    if (titleMatch ) {
    newItem.caseName = titleMatch[1] + " [" + titleMatch[2] + "] " + titleMatch[3] + " " + titleMatch[4];
    newItem.dateDecided = titleMatch[7] + " " + titleMatch[6] + " " + titleMatch[5];
    newItem.court = titleMatch[3];
    } else {
    newItem.caseName = doc.title;
    newItem.dateDecided = "not found";
    }

    newItem.complete();
    }

    function doWeb(doc, url) {
    var austliiRegexp = /^http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+/
    if(austliiRegexp.test(url)) {
    scrape(doc);
    } else {

    var items = Zotero.Utilities.getItemArray(doc, doc, austliiRegexp);
    items = Zotero.selectItems(items);

    if(!items) {
    return true;
    }

    var urls = new Array();
    for(var i in items) {
    urls.push(i);
    }

    Zotero.Utilities.processDocuments(urls, scrape, function() { Zotero.done(); });
    Zotero.wait();
    }
    }

    Sean,

    I've set up a Google Code project for my stuff here: http://code.google.com/p/zotero-for-lawyers/

    I think the translator is ready to include if you see fit. The sql is available here: http://zotero-for-lawyers.googlecode.com/svn/trunk/AustLII-NZLII-translator.sql

    I plan to add other "Legal Information Institute" websites over time and will keep you updated.
  • Looks great Bill. Fantasic work.

    Have you heard from Sean about whether the fields might be altered to include the judge name and more appropriately provide for legal materials?

    regards

    Grieth
  • Let us know what improvements you think there should be for the legal item types, and we'd be happy to adjust the schema as long as it can be done in a universally applicable way (at least until we have custom item types functionality).
  • Hey Dan,

    I saw the thread on the dev group about adding custom types. It looks like a tricky area and will need input from a bunch of stakeholders. "Universally applicable" is a key phrase since legal citation and context varies country to country (eg: common law vs. civil law, etc.). My guess is that "universally applicable" is an unattainable goal in many use cases. The attempt to maintain a controlled vocabulary will also be tricky during a time in which social tagging predominates.

    In any case, I think a good, non-controversial first step would be to add "Judge" to the author types. Also, I was doing some work on a bill-resolution translator and would like to see "Cosponsor" added as well.

    If you all get any further on the "adding custom type" project, let me know since I would like to add Treaties to the itemTypes as well.

    Keep up the good work.
  • Thanks, Bill. I've created a ticket for the "judge" and "cosponsor" creator types. I assume judge would be for cases and cosponsor would be for statutes—correct me if I'm wrong.
  • Cosponsor should go in "bill" itemType (sponsor is already there).
  • Whilst 1 or three is most common, there can be up to 7 judges in Aust. I think it would need to accommodate up to 9 in some countries.

    I suspect that a 'Jurisdiction' field would also help most lawyers. For example, in the US and Aust there is a separate jurisdiction for the Feds and each state.

    A field for citation, rather than breaking it up like a journal reference would be more normal for lawyers, and helpful if more than one citation could be added to cope with cases appearing in more than one set of law reports as most authors do give multiple citations if they are available.

    Having said that, the Judge field, along with Bill's translator is enough to make this a great tool for most lawyers.

    Great work!

    grieth
  • Have done an update of the translator as it was failing with cases located using the search engine. The part of the regular expression \/cgi-bin\.disp\.pl needs to be cgi-bin\.sinodisp now. have put changes on google code page. what do I need to do for it to be tested and added?

    Is anything happening about adding a field for 'Judge'. It is simply silly to have a giblio program that doesn't record the author's identity.
  • edited January 24, 2010
    @grieth,

    A bunch of detail to be worked out for law support (off the top of my head, I would add docket number for unpublished and undecided cases, and a secondary date for cases decided on one date, and published in a newspaper on another). Here's a link to our own mini-project where we're looking into the issues on a kind of try by doing basis:

    [url elided, as this project has fulfilled its purpose and been brought to a close]

    Frank Bennett
  • Hi all,
    Can you help me with the AustLII translator. It doesn't seem to work?

    Here's a sample case.

    http://www.austlii.edu.au/cgi-bin/sinodisp/au/cases/cth/HCA/1975/26.html?query=^moorhouse

    Thanks.

    Dale.
  • Hi Dale,

    I have done a quick fix for the problem you have, see post above. Basically, the current translator fails as Austlii have changed their url for when you do a search, since the first translator was done.

    If you are on version 2 I can send you a .js file that seens to work fine to overcome your problem.

    Grant
  • Hi Grant -

    I have the same problem probably with the AustLII translator. Can you send me the .js file too and then what do I do with it - install it somewhere?

    By the way, thanks for all the hard work - it's greatly appreciated.

    Thanks,
    Joyce.
  • edited January 24, 2010
    Grant (and everyone),

    EDIT: D'oh. When I wrote the text below, I was assuming that "Author" existed on the Case type, which ... it doesn't; we currently have only "Counsel". So this should be read as a suggestion to add "Author" -- rather than "Judge" -- to the Case type.

    Do you still have strong feelings about a "Judge" label for the author(s) of legal cases? The more I think about it, the more convinced I become that plain vanilla "Author" is the reasonable thing to do. Here's the thinking.

    (1) We don't use judge names in the body of citations, as far as I know. A name or names may be given -- even required -- in a trailing parenthetical, but that supplementary information will have to be supplied by the user in any case, based on his or her notes on the case.

    (2) Many of the sources from which Zotero will scrape metadata do not provide the full list of judges in panel decisions in an easily parseable form. If only the judge authoring the "opinion of the court" (a U.S. custom) can be reliably captured (this is the case with U.S. Supreme Court judgements on the Cornell LII), the label "Judge" might actually be misleading, since the individual judge is truly listed as an author, not as a judge entirely responsible for deciding the case.

    (3) The Case type can be dual-purposed for arbitral awards as well as court judgements. In that case "Judge" would be misleading, and create pressure to add further analogous roles, such as "Arbitrator", "Chief Arbitrator", "Chairman", etc. If none of this metadata is needed in the body of citations, and given that child-notes now offer a flexible, searchable means of attaching supplementary information to an item for research purposes, it seems hard to justify adding complexity to the UI for the purpose of precisely describing the role of the "decider".

    I'm willing to be convinced, but the way things seem to fit together at the moment, it seems like "Author" works well enough as a one-size-fits-all solution. ... ?
  • edited January 25, 2010
    (EDIT: I take back what is written below: names are disambiguated, in the new processor at least, only when rendered, so either "Author" or "Decided by" is possible. The former is probably preferable, to avoid bloat in the UI. It's also more clear -- there's less chance that the variable will attract extraneous data, such as the name of the court that a case was "decided by".)

    A further thought here, and an amendment to the proposal above. If the names of justices or arbitrators are fed through to CSL, they will end up in the names disambiguation pool (in both the current and the new processor), if they are within the scope of the et-al-min/et-al-use-first settings. That could cause a lot of nasty confusion, since in a judgement the names of the decisonmakers are not rendered in the body of the citation.

    For example, if a case has "Blackmun" listed as a judge, and this is passed to CSL, and an article has "George Blackmun" as author, a disambiguating style will show "G. Blackmun" was the author of the article, rather than "Blackmun". The cause of the expansion will not be obvious from the text, because the name embedded in the case data is not rendered anywhere.

    We can dodge this problem by not passing author names at all for legal cases. Accordingly, I would suggest that a "Decided by" creator type be added, and that neither this nor either of the currently available "Counsel" or "Contributor" be mapped to CSL "author".
  • I don't know if it's completely related but in French law, you don't cite (never) the names of judges: the decision is that of the court and the "author" is the court. There is no "opinion" (from a judge).
    From this point of view, I would be very happy if "counsel" would be un-mapped (i.e. dissociated from author as it seems to be at the moment).
  • I think the result of the change will be that "Author" is added and mapped to CSL "author" (just because it's simpler to let it pass through), and "Counsel" gets unmapped. We don't use judge names in the body of cites in common law jurisdictions either, so CSL will likely never do anything with the value; but the data might be handy on Zotero-side for targeted searches to support research.
  • Good.
    But how will you use this "Author"? To add judge names?
  • That's what I would expect. For what it's worth, the translators for the Cornell LII, AustLII, CANLii and the other LII's (all dating from 2007) are already grabbing judge names and putting them into an "Author" field.
  • Indeed, I tried this translator and was surprised to see "Author" in my zotero (1.0)...
    This solution (to add Author) seems better than the previous one.

    [Indirectly related: one of my concern with French law is that I'd like to classify, in zotero UI, cases by Court as it is possible at the moment with author for books, articles, etc... I've already post a message on this issue: http://forums.zotero.org/discussion/10011/sorting-items-by-court-in-middle-pane/#Item_3 ]
  • edited January 26, 2010
    "Author" is available on Case items in Zotero 2.0rc1, available now. "Cosponsor" has also been added to Bill items.
Sign In or Register to comment.