Translator list

ajlyon · October 14, 2011

Since the translator list at http://www.zotero.org/translators/ is woefully outdated, I'd like to see the list removed altogether, and replaced with a short overview of how translators fit into the Zotero ecosystem, with links to appropriate user and developer documentation, and perhaps even to the translator repository.

I'd do this, but it's not a wiki page, so I can't edit it.

Rintze · October 14, 2011

Big +1. :)

dstillman · October 15, 2011

http://www.zotero.org/support/translators would be a starting point, I would think.

Feel free to edit that page, and we can redirect /translators there.

Simon · October 15, 2011

Once we are doing automated unit testing, we can actually provide an updated list of functioning translators (although not necessarily sites).

ajlyon · October 15, 2011

Translator status, of course, would be a great boon. So we can aim for a more appropriate introduction in the support wiki and redirect there for now, then post the status page at that URL when it's ready?

I'll see what I can do to make the support page more informative and appropriate.

adamsmith · October 15, 2011

My hope was actually to start working on a comprehensive list of translators & sites, which needs to involve some manual work. Part of the process would also include creating tests for more translators. If we could divide that work some and see how to best set this up it would be more manageable.

adamsmith · October 19, 2011

I've started some work here - am simultaneously adding tests to working translators
https://docs.google.com/spreadsheet/ccc?key=0Atyc_yMcWirjdGVUOWs0U3NJX0tKc20zNWk3R203a3c&hl=en_US

fbennett · October 19, 2011

I would like to contribute to this on law sites.

adamsmith · October 19, 2011

great! the spreadsheet can be edited by anyone - I don't expect vandalism (although you never know what Thompson/Reuter is up to after sunset).

adamsmith · October 22, 2011

@Frank - the first one I have come across is the austlii/nzlii translator, which is completely defunct if you want to have a look?

fbennett · October 22, 2011

I'm working on the style end this week and next. Styles have complex rules for abbreviation of field content, so I need to nail down what goes where before turning back to translator development.

Sites mostly provide abbreviated forms in metadata, which I want to expand back into a full description in the translators. The BaiLII translator in MLZ shows what will be required:

https://www.zotero.org/svn/extension/branches/trunk-multilingual/translators/BAILII.js

It's a headache, but by producing full descriptions, we'll get data that provides more meaningful information when rendered in a style that has no legal support. It will also provide quasi-canonical data from which to generate abbreviations, which can vary from style to style.

So it might take a month or so, but I'll get on to this as soon as I have the basics of OSCOLA and one other style done enough for demo purposes.

adamsmith · November 19, 2011

OK -
https://docs.google.com/spreadsheet/ccc?key=0Atyc_yMcWirjdGVUOWs0U3NJX0tKc20zNWk3R203a3c&hl=en_US#gid=0
has a complete annotated list of translators, including sample URLs. All of the working translators on the list now have translator tests (with the exception of those where that's not possible).

I fixed a lot of small issues on the way, but if you look at the list you'll notice a lot of broken or semi-broken translators.
Some help in gradually fixing some of these issues would be great.

Finally - where do we go from here? We could use the google table and turn it into a translator list or we could auto-generate something as Simon suggests above.

ajlyon · November 19, 2011

We should auto-generate, since the testing code is in place.

Rintze · December 5, 2011

With regard to broken translators, do the Zotero clients phone home any details on save failures? (there is a preference checkbox "Report broken site translators" which suggests they do)

I don't mind fixing up a few more translators, but it would be nice to know which translators fail most often.

ajlyon · December 5, 2011

It does phone home, but I'm afraid those reports are going into a black hole for now; I've noticed the requests in various logs, but I've never been notified of a failing translator by the Zotero team. It'd be great if the translator list / status page integrated explicit tests and such error reports.

adamsmith · December 5, 2011

there is, of course, also a good number of translators who don't trigger any errors, because they don't detect.

Rintze · December 5, 2011

Yes, but I would argue that non-detecting translators are less frustrating to users.

dstillman · December 7, 2011

Here's a start:

https://repo.zotero.org/errors

The actual error reports aren't public for privacy reasons (and we're not displaying absolute numbers), but we can provide example error strings and URLs on request. We also might be able to have this automatically display error strings that show up across many reports (e.g., "TypeError: scisig is null" for Google Scholar), since short of major site breakages it will probably be hard to debug many of these without examples.

Note that the Google Scholar results are greatly skewed by Retrieve Metadata attempts, and DOI is also showing mostly "could not find DOI" errors. I'm hoping detection can be tightened on those (e.g., to remove the folder icon on a Google Scholar search with no results), which would allow this to better show actual error frequency.

ajlyon · December 7, 2011

I'll try to work on detection. Automatic display of common error strings would be very useful, as well as some general idea of how many errors we're talking about-- for something like ScienceDirect, are we talking about 10 errors? 100? 1000?

Also, does this filter out data from clients with out-of-date translators or Zotero versions?

Thanks for putting this up! It's sure to be useful in the coming weeks and years.

Rintze · December 7, 2011

Like ajlyon, I think some indication of the number of errors per translator would be very useful. And could the list be expanded to show more than the top 10 translators (say the top 50)?

Also, would it be possible to create somewhat comprehensive reports with, say, 10 error strings and URLs for each translator to send to ajlyon, adamsmith and me, so we don't have to submit individual requests per translator? I'd hope we have established ourselves as at least somewhat trustworthy (and I assume all three of us would be more than willing to sign any privacy agreement).

ajlyon · December 8, 2011

Thanks for upping the number visible.

What's going on with the outdated translators? There are people out there with three different ScienceDirects, two DOIs... Is that just people with updating off? Or something else?

dstillman · December 8, 2011

OK, updated again with absolute numbers and per-error breakdowns. Hover over each segment for error details. I don't think any page data will make it into the errors, but to be safe I'm displaying only errors coming from at least three addresses that don't include the string "http" in them—the rest get lumped together at the end in blue. If you notice anything that shouldn't be in there, let me know.

We might be able to display URLs that show up across enough addresses, though there may not be enough of those.

What's going on with the outdated translators?

Those are all <2.1.9. Not much we can do for those folks.

ajlyon · December 8, 2011

Thanks for the added details; this list is certain to be pretty useful. The last thing I'm looking for in it would be an indication that an error report is old-- that is, mark which reports are from versions of translators older than the latest one.

Also, can you provide a sample URL for the ScienceDirect TypeError reports (by email if appropriate)? I'll see if I can't pin down what's causing that.

adamsmith · December 8, 2011

yeah, just want to second this - it's tremendously useful, already spotted&fixed the first major issue.

dstillman · December 8, 2011

The last thing I'm looking for in it would be an indication that an error report is old-- that is, mark which reports are from versions of translators older than the latest one

Done.

ajlyon · December 8, 2011

Thanks! Now to fix these errors...

adamsmith · December 8, 2011

seconding Rintze and ajlyon - is there any mechanism we could find for getting sample URLs for some of the errors to at least the three of us? (I'd want to look into Taylor & Francis and Hathi Trust)

dstillman · December 8, 2011

can you provide a sample URL for the ScienceDirect TypeError reports

http://www.sciencedirect.com/science/article/pii/S0939641108001902

That's this error:

newDoc.evaluate("//a[contains(@class, \"icon_exportarticlesci_dir\")]", newDoc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext() is null

Works for me, though.

ajlyon · December 9, 2011

See that nice red bar at the top of the repo status page? We weren't doing any checking in detectWeb, which was probably triggering an error every time someone tried to retrieve metadata for a PDF and failed. Let's hope that little nit stays fixed-- and I wouldn't have thought to check it without the status page. Thanks again for finally making the reports public.

dstillman · December 9, 2011

ajlyon: That's great. Thanks.

adamsmith · December 10, 2011

Dan - could I please get a sample URL for the errors on
- APA Psycnet
- Taylor & Francis
- Wall Street Journal
If this isn't a good place to ask let me/us know.