Translator list

Since the translator list at http://www.zotero.org/translators/ is woefully outdated, I'd like to see the list removed altogether, and replaced with a short overview of how translators fit into the Zotero ecosystem, with links to appropriate user and developer documentation, and perhaps even to the translator repository.

I'd do this, but it's not a wiki page, so I can't edit it.
«1
  • http://www.zotero.org/support/translators would be a starting point, I would think.

    Feel free to edit that page, and we can redirect /translators there.
  • Once we are doing automated unit testing, we can actually provide an updated list of functioning translators (although not necessarily sites).
  • Translator status, of course, would be a great boon. So we can aim for a more appropriate introduction in the support wiki and redirect there for now, then post the status page at that URL when it's ready?

    I'll see what I can do to make the support page more informative and appropriate.
  • My hope was actually to start working on a comprehensive list of translators & sites, which needs to involve some manual work. Part of the process would also include creating tests for more translators. If we could divide that work some and see how to best set this up it would be more manageable.
  • edited October 19, 2011
    I've started some work here - am simultaneously adding tests to working translators
    https://docs.google.com/spreadsheet/ccc?key=0Atyc_yMcWirjdGVUOWs0U3NJX0tKc20zNWk3R203a3c&hl=en_US
  • I would like to contribute to this on law sites.
  • great! the spreadsheet can be edited by anyone - I don't expect vandalism (although you never know what Thompson/Reuter is up to after sunset).
  • @Frank - the first one I have come across is the austlii/nzlii translator, which is completely defunct if you want to have a look?
  • I'm working on the style end this week and next. Styles have complex rules for abbreviation of field content, so I need to nail down what goes where before turning back to translator development.

    Sites mostly provide abbreviated forms in metadata, which I want to expand back into a full description in the translators. The BaiLII translator in MLZ shows what will be required:

    https://www.zotero.org/svn/extension/branches/trunk-multilingual/translators/BAILII.js

    It's a headache, but by producing full descriptions, we'll get data that provides more meaningful information when rendered in a style that has no legal support. It will also provide quasi-canonical data from which to generate abbreviations, which can vary from style to style.

    So it might take a month or so, but I'll get on to this as soon as I have the basics of OSCOLA and one other style done enough for demo purposes.
  • OK -
    https://docs.google.com/spreadsheet/ccc?key=0Atyc_yMcWirjdGVUOWs0U3NJX0tKc20zNWk3R203a3c&hl=en_US#gid=0
    has a complete annotated list of translators, including sample URLs. All of the working translators on the list now have translator tests (with the exception of those where that's not possible).

    I fixed a lot of small issues on the way, but if you look at the list you'll notice a lot of broken or semi-broken translators.
    Some help in gradually fixing some of these issues would be great.

    Finally - where do we go from here? We could use the google table and turn it into a translator list or we could auto-generate something as Simon suggests above.
  • We should auto-generate, since the testing code is in place.
  • With regard to broken translators, do the Zotero clients phone home any details on save failures? (there is a preference checkbox "Report broken site translators" which suggests they do)

    I don't mind fixing up a few more translators, but it would be nice to know which translators fail most often.
  • It does phone home, but I'm afraid those reports are going into a black hole for now; I've noticed the requests in various logs, but I've never been notified of a failing translator by the Zotero team. It'd be great if the translator list / status page integrated explicit tests and such error reports.
  • there is, of course, also a good number of translators who don't trigger any errors, because they don't detect.
  • Yes, but I would argue that non-detecting translators are less frustrating to users.
  • Here's a start:

    https://repo.zotero.org/errors

    The actual error reports aren't public for privacy reasons (and we're not displaying absolute numbers), but we can provide example error strings and URLs on request. We also might be able to have this automatically display error strings that show up across many reports (e.g., "TypeError: scisig is null" for Google Scholar), since short of major site breakages it will probably be hard to debug many of these without examples.

    Note that the Google Scholar results are greatly skewed by Retrieve Metadata attempts, and DOI is also showing mostly "could not find DOI" errors. I'm hoping detection can be tightened on those (e.g., to remove the folder icon on a Google Scholar search with no results), which would allow this to better show actual error frequency.
  • I'll try to work on detection. Automatic display of common error strings would be very useful, as well as some general idea of how many errors we're talking about-- for something like ScienceDirect, are we talking about 10 errors? 100? 1000?

    Also, does this filter out data from clients with out-of-date translators or Zotero versions?

    Thanks for putting this up! It's sure to be useful in the coming weeks and years.
  • Like ajlyon, I think some indication of the number of errors per translator would be very useful. And could the list be expanded to show more than the top 10 translators (say the top 50)?

    Also, would it be possible to create somewhat comprehensive reports with, say, 10 error strings and URLs for each translator to send to ajlyon, adamsmith and me, so we don't have to submit individual requests per translator? I'd hope we have established ourselves as at least somewhat trustworthy (and I assume all three of us would be more than willing to sign any privacy agreement).
  • Thanks for upping the number visible.

    What's going on with the outdated translators? There are people out there with three different ScienceDirects, two DOIs... Is that just people with updating off? Or something else?
  • OK, updated again with absolute numbers and per-error breakdowns. Hover over each segment for error details. I don't think any page data will make it into the errors, but to be safe I'm displaying only errors coming from at least three addresses that don't include the string "http" in them—the rest get lumped together at the end in blue. If you notice anything that shouldn't be in there, let me know.

    We might be able to display URLs that show up across enough addresses, though there may not be enough of those.
    What's going on with the outdated translators?
    Those are all <2.1.9. Not much we can do for those folks.
  • Thanks for the added details; this list is certain to be pretty useful. The last thing I'm looking for in it would be an indication that an error report is old-- that is, mark which reports are from versions of translators older than the latest one.

    Also, can you provide a sample URL for the ScienceDirect TypeError reports (by email if appropriate)? I'll see if I can't pin down what's causing that.
  • yeah, just want to second this - it's tremendously useful, already spotted&fixed the first major issue.
  • The last thing I'm looking for in it would be an indication that an error report is old-- that is, mark which reports are from versions of translators older than the latest one
    Done.
  • Thanks! Now to fix these errors...
  • seconding Rintze and ajlyon - is there any mechanism we could find for getting sample URLs for some of the errors to at least the three of us? (I'd want to look into Taylor & Francis and Hathi Trust)
  • can you provide a sample URL for the ScienceDirect TypeError reports
    http://www.sciencedirect.com/science/article/pii/S0939641108001902

    That's this error:
    newDoc.evaluate("//a[contains(@class, \"icon_exportarticlesci_dir\")]", newDoc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext() is null
    Works for me, though.
  • See that nice red bar at the top of the repo status page? We weren't doing any checking in detectWeb, which was probably triggering an error every time someone tried to retrieve metadata for a PDF and failed. Let's hope that little nit stays fixed-- and I wouldn't have thought to check it without the status page. Thanks again for finally making the reports public.
  • ajlyon: That's great. Thanks.
  • Dan - could I please get a sample URL for the errors on
    - APA Psycnet
    - Taylor & Francis
    - Wall Street Journal
    If this isn't a good place to ask let me/us know.
Sign In or Register to comment.