Not signed in (Sign In)
Vanilla 1.1.4 is a product of Lussumo. More Information: Documentation, Community Support.
-
- CommentAuthorbill
- CommentTimeMay 8th 2007 edited
Grieth,
I've created a basic translator for AustLII (http://www.austlii.edu.au/) that you can build upon.
Step 1: Download and install Zotero's Scaffold Utility here:
http://dev.zotero.org/docs/scaffold
Step 2: Fill out the "Metadata" tab something like this:
http://mckinney.sw.googlepages.com/scaffold-metadata.png
Step 3: Cut and paste code below into the "Detect Code" tab like this:
http://mckinney.sw.googlepages.com/scaffold-detect-code.png
Step 4: Cut and past code below into the "Code" tab like this:
http://mckinney.sw.googlepages.com/scaffold-code.png
Step 5: Click the "Save to Database" icon on the toolbar (second from left).
Step 6: Test some AustLII cases.
Good luck!
--bill
============== translator code =================
function detectWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;
var austliiRegexp = /^http:\/\/www\.austlii\.edu\.au\/au\/cases\/.+/
if(austliiRegexp.test(url)) {
return "book";
} else {
var aTags = doc.getElementsByTagName("a");
for(var i=0; i<aTags.length; i++) {
if(austliiRegexp.test(aTags[i].href)) {
return "multiple";
}
}
}
}
function scrape(doc) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;
var courtArray = new Array();
courtArray ['FCAFC'] = 'Federal Court of Australia - Full Court (FCAFC)';
courtArray ['FCA'] = 'Federal Court of Australia (FCA)';
courtArray ['FamCA'] = 'Family Court of Australia (FamCA)';
courtArray ['FMCA'] = 'Federal Magistrates Court of Australia (FMCA)';
var newItem = new Zotero.Item("case");
newItem.title = doc.title;
newItem.url = doc.location.href;
var titleRegexp = /^(.+)\s+\[(\d+)\]\s+(\w+)\s(\d+)\s+\((\d+)\s+(\w+)\s+(\d+)\)/
var titleMatch = titleRegexp .exec(doc.title);
if (titleMatch ) {
newItem.caseName = titleMatch[1] + " [" + titleMatch[2] + "] " + titleMatch[3] + " " + titleMatch[4];
newItem.dateDecided = titleMatch[7] + " " + titleMatch[6] + " " + titleMatch[5];
if (courtArray[titleMatch[3]]) {
newItem.court = courtArray[titleMatch[3]];
} else {
newItem.court = titleMatch[3];
}
} else {
newItem.caseName = doc.title;
newItem.dateDecided = "not found";
}
newItem.complete();
}
function doWeb(doc, url) {
var austliiRegexp = /^http:\/\/www\.austlii\.edu\.au\/au\/cases\/.+/
if(austliiRegexp.test(url)) {
scrape(doc);
} else {
var items = Zotero.Utilities.getItemArray(doc, doc, austliiRegexp);
items = Zotero.selectItems(items);
if(!items) {
return true;
}
var urls = new Array();
for(var i in items) {
urls.push(i);
}
Zotero.Utilities.processDocuments(urls, scrape, function() { Zotero.done(); });
Zotero.wait();
}
} -
- CommentAuthorgrieth
- CommentTimeMay 11th 2007
Thanks Bill. this looks fantastic. I will start trying to get on top of it. -
- CommentAuthorsean
- CommentTimeMay 11th 2007
Bill,
If you are satisfied with this translator, we would be pleased to roll it out into the general Zotero distribution. Please let us know whether it's ready. -
- CommentAuthorbill
- CommentTimeMay 11th 2007
Sean,
Lets wait and see what Grieth has to say or add, seeing as he's an Aussie. I did some basic testing but nothing extensive. The AustLII resource he pointed us to is pretty amazing and a perfect candidate for Zotero. I might also want to try my hand at writing a translator for the legislative resources there.
We'll be sure to give you a shout when its ready.
--bill -
- CommentAuthorsean
- CommentTimeMay 11th 2007
Bill,
Sounds good. Thanks for all of your hard work. It's terrific to have people in the community taking the initiative to expand Zotero's functionality. -
- CommentAuthorgrieth
- CommentTimeMay 12th 2007
Bill, Great work. I have found a small problem. When using the AustLII search engine, the resulting links are in this form:
http://www.austlii.edu.au//cgi-bin/disp.pl/au/cases/cth/family_ct/2006/25.html?query=family%20law%20and%20kirby%20and%20relocation
This seems to fool the translator - can it ignore the '?query' and everything following in the address?
I also realised that a list of court names for abreviations would soon bloat the translator if every AustLII court were added - can the court name be scraped from the page.
The translator should also work for lots of other LII's, eg
www.nzlii.org/nz/cases
www.bailii.org/??/cases
www.saflii.org/??/cases
www.paclii.org/??/cases
www.hklii.org/??/cases
(the ?? represents the two letter jurisdiction code which varies)
Beyond this, further improvements would require the cases template to have fields for the information that lawyers use. In the Eng/Aust areas the judges name(s) are essential, but there is no field for this in Zotero's case template. Lots fo the info in the template isn't used by aust lawyers/legal academics. I think that Bill has already done a lot of great work on this (building his own variant of Zotero in this regard) - Maybe you can explore this with Bill.
grieth
PS Is there an idiots guide to writing this stuff? I have done a lot of macro programming in visual basic for Word, but am a lawyer, not computer programmer. and I can't get a handle on the coding. I think that I will need to if I want to help develop this to extract catchwords, lists of cases referred to, etc that all appear on these pages. -
- CommentAuthorbill
- CommentTimeMay 13th 2007
Grieth,
1) I'll take a look at adding search link capability. It should be feasible.
2) I agree with you about bloat for expanding court abbreviations. Ultimately they would be better located in a SQLite table. I'll try to see if page scraping might work as you suggest.
3) Yes, for now the judge can only be added as a "contributor" I guess. Another possibility is to add it as a note. I really believe that each country should have its own case schema, but that is a lot to ask of the Zotero people right now. As an American lawyer, I have an opinion about what the case fields should be for the U.S., but I am clueless about other countries.
Re: Idiots Guide - unfortunately, translator authoring is a bit tricky. Zotero's Scaffold tool is big step in the right direction. A good book on javascript and regular expressions will be the most help. O'Reilly is a good source for both of these topics - see: http://www.oreilly.com/
Also, it is really unfortunate that websites like AustLII don't expend a little bit more effort adding some semantic information so that complex page scrapping wouldn't be required. You might suggest to them that they add some basic "META" tags to the HTML for things like judge, catchwords, etc. Cornell's LII does this to some degree and it makes the job so much easier. -
- CommentAuthorgrieth
- CommentTimeMay 17th 2007
thanks Bill.
Looking forward to seeing the next version. I think judge's name as 'contributor' (whatever that is meant to mean) is better than in the notes section - while there is no 'judge' field. How does anyone develop a schema that has no field for the author (the judge who wrote the decision)?
I will talk to AustLII about meta tags. I suspect it would be a project that would have to involve the courts as the cases are sent by email to an automated address for conversion to html and upload - but it will have to be done eventually. At present they are working through a funding problem, but that should be resolved over the next couple of months.
Sean,
Is Zotero prepared to look at the schema problems, especially that there is no field for the judge's name, given that Bill has already done a lot of work on that. I suspect that Bill's schema would be adequate for all common law jurisdictions, even though it is not perfectly tailored for each country - at least a lot better than the current one.
Also, the tags field box is fairly short and when the tag exceeds the box length, it doesn't adjust to a two line box, you just can't see the end of the tag. Can this be fixed? (I use long tags (short heading connected with a '-' as a workaround for tags being at a single level or flat database). -
- CommentAuthorgrieth
- CommentTimeMay 20th 2007
Bill,
I tried simply adding
int postion = austliiRegexp.indexOf("?");
If (position>1){
austliiRegexp = austliiRegexp.substring(0,position);
}
at some apparently strategic points, but couldn't get the translator to work. Is it possible to simply reload the page after this expression is used to convert the address, or is that more problematic? -
- CommentAuthorbill
- CommentTimeMay 20th 2007
Grieth,
I had some time to play around with this and tried to see if it was feasible to extract court and judge from the body of the page. It doesn't look good since my tests show that the tagging over the years varies (eg: court names sometime appear in the first h2 tag and other times they use h1 tags). I couldn't find any particular pattern to where to find the judge. This is the main problem with screen scraping and the achilles heal for Zotero in general.
As a comparison, try viewing source on a Cornell LII page (eg: http://www.law.cornell.edu/supct/html/05-1575.ZS.html). They at least capture basic information in semantic tags:
<meta name="CASENAME" CONTENT="SCHRIRO V. LANDRIGAN">
<meta name="COURTBELOW" CONTENT="certiorari to the united states court of appeals for the ninth circuit">
<meta name="ARGDATE" CONTENT="January 9, 2007">
<meta name="DECDATE" CONTENT="May 14, 2007">
<meta name="DOCKET" CONTENT="05-1575">
<meta name="PARTY1" CONTENT="SCHRIRO">
<meta name="PARTY2" CONTENT="LANDRIGAN">
I think our best bet is to support basic Zotero import using the title information and also support the search result urls. Users can always enter missing info like the judge into the fields manually for now. I'll let you know when I get the search urls added. Your javascript attempt was in the right direction!
--bill -
- CommentAuthorgrieth
- CommentTimeMay 27th 2007
I agree.
The only pattern to finding the judge name is that is usually is to the right of certain words like Judge, judges or coram - but not always. I noticed that a lot of times this is in a table format as well. I think your right that the best bet is AustLII adding meta data.
Getting something functional is a good start. If people start using it, AustLII may work on the meta data.
Looking forward to the revised translator - I wish I could be more help there. -
- CommentAuthorbill
- CommentTimeJun 4th 2007
Grieth,
I think I have something that should work minimally on both the AustLII and NZLII websites, including search URLs.
1. Replace the target field on the "Metadat" tab of Scaffold with this:
http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+
2. Replace "Detect Code" tab contents with:
function detectWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;
var austliiRegexp = /^http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+/
if(austliiRegexp.test(url)) {
return "book";
} else {
var aTags = doc.getElementsByTagName("a");
for(var i=0; i<aTags.length; i++) {
if(articleRegexp.test(aTags[i].href)) {
return "multiple";
}
}
}
}
3. Replace "Code" tab contents with:
function scrape(doc) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;
var newItem = new Zotero.Item("case");
newItem.title = doc.title;
newItem.url = doc.location.href;
var titleRegexp = /^(.+)\s+\[(\d+)\]\s+(\w+)\s(\d+)\s+\((\d+)\s+(\w+)\s+(\d+)\)/
var titleMatch = titleRegexp .exec(doc.title);
if (titleMatch ) {
newItem.caseName = titleMatch[1] + " [" + titleMatch[2] + "] " + titleMatch[3] + " " + titleMatch[4];
newItem.dateDecided = titleMatch[7] + " " + titleMatch[6] + " " + titleMatch[5];
newItem.court = titleMatch[3];
} else {
newItem.caseName = doc.title;
newItem.dateDecided = "not found";
}
newItem.complete();
}
function doWeb(doc, url) {
var austliiRegexp = /^http:\/\/www\.(?:austlii\.edu\.au|nzlii\.org)\/(?:\/cgi-bin\/disp\.pl\/)?(?:au|nz)\/cases\/.+/
if(austliiRegexp.test(url)) {
scrape(doc);
} else {
var items = Zotero.Utilities.getItemArray(doc, doc, austliiRegexp);
items = Zotero.selectItems(items);
if(!items) {
return true;
}
var urls = new Array();
for(var i in items) {
urls.push(i);
}
Zotero.Utilities.processDocuments(urls, scrape, function() { Zotero.done(); });
Zotero.wait();
}
}
Sean,
I've set up a Google Code project for my stuff here: http://code.google.com/p/zotero-for-lawyers/
I think the translator is ready to include if you see fit. The sql is available here: http://zotero-for-lawyers.googlecode.com/svn/trunk/AustLII-NZLII-translator.sql
I plan to add other "Legal Information Institute" websites over time and will keep you updated. -
- CommentAuthorgrieth
- CommentTimeJun 9th 2007
Looks great Bill. Fantasic work.
Have you heard from Sean about whether the fields might be altered to include the judge name and more appropriately provide for legal materials?
regards
Grieth -
- CommentAuthorDan Stillman
- CommentTimeJun 9th 2007
Let us know what improvements you think there should be for the legal item types, and we'd be happy to adjust the schema as long as it can be done in a universally applicable way (at least until we have custom item types functionality). -
- CommentAuthorbill
- CommentTimeJun 9th 2007
Hey Dan,
I saw the thread on the dev group about adding custom types. It looks like a tricky area and will need input from a bunch of stakeholders. "Universally applicable" is a key phrase since legal citation and context varies country to country (eg: common law vs. civil law, etc.). My guess is that "universally applicable" is an unattainable goal in many use cases. The attempt to maintain a controlled vocabulary will also be tricky during a time in which social tagging predominates.
In any case, I think a good, non-controversial first step would be to add "Judge" to the author types. Also, I was doing some work on a bill-resolution translator and would like to see "Cosponsor" added as well.
If you all get any further on the "adding custom type" project, let me know since I would like to add Treaties to the itemTypes as well.
Keep up the good work. -
- CommentAuthorDan Stillman
- CommentTimeJun 10th 2007
Thanks, Bill. I've created a ticket for the "judge" and "cosponsor" creator types. I assume judge would be for cases and cosponsor would be for statutes—correct me if I'm wrong. -
- CommentAuthorbill
- CommentTimeJun 10th 2007
Cosponsor should go in "bill" itemType (sponsor is already there). -
- CommentAuthorgrieth
- CommentTimeJun 14th 2007
Whilst 1 or three is most common, there can be up to 7 judges in Aust. I think it would need to accommodate up to 9 in some countries.
I suspect that a 'Jurisdiction' field would also help most lawyers. For example, in the US and Aust there is a separate jurisdiction for the Feds and each state.
A field for citation, rather than breaking it up like a journal reference would be more normal for lawyers, and helpful if more than one citation could be added to cope with cases appearing in more than one set of law reports as most authors do give multiple citations if they are available.
Having said that, the Judge field, along with Bill's translator is enough to make this a great tool for most lawyers.
Great work!
grieth
1 to 18 of 18
