Translator request: epubs.siam.org

duvane5678 · April 16, 2014

A translator for the ebooks and journal articles on this site would be highly convenient. The site does have RIS downloads available, but they seem to lose the authors' names on book chapters, and of course a zotero-native translator would allow downloading multiple items from tables of contents and direct download of pdfs.

I'm not averse to trying to teach myself how to work with translators and tackling it myself as I have time, but I figured I'd put this one out there and see if there was any interest from those with the experience to do it much more quickly.

adamsmith · April 16, 2014

we have a translator for "Atypon Journals" which is the CMS they're using. It'd be very quick to adjust that to work - basically adjust the target regex and make sure everything else works, which seems to be the case after a quick look. See if you want to do that yourself, otherwise I'll get to it pretty soon.

aurimas · April 16, 2014

adamsmith · April 16, 2014

generally yes - that was always my idea.
We'll need to poke around for the multiple part a bit - e.g. I found this:
http://www.esajournals.org/toc/ecol/95/1
which wouldn't work with your regex (but could easily be made to work - just saying that we should look closely).
I think the multiples are going to be the trickiest/most different, but we know how to test for those on detect, so that shouldn't be a problem.

aurimas · April 16, 2014

We have a dedicated translator for this, but http://www.tandfonline.com/toc/wgge20/current#.U09Cu_ldWSo The point being that this could happen on other websites too and /toc/ seems like the only common denominator.

adamsmith · April 17, 2014

that's, um, very little. the toc always seems to be the first part after the host, so we could do something like
^https?://[^/]+/toc/ ?

aurimas · April 17, 2014

Done

duvane5678 · April 17, 2014

That was fast, I hardly had chance to look at it. :-) Most of the pages seem to work great. However, the book main pages aren't being picked up. They all have "book" instead of "toc" in their URL, e.g. http://epubs.siam.org/doi/book/10.1137/1.9780898718553 The conference proceedings are the same, e.g. http://epubs.siam.org/doi/book/10.1137/1.9781611972863 Hopefully that would be relatively straightforward to add? AFAICT, the journals all have "toc" in their URLs, so they're ok.

Very minor: "ESA" is showing up in the library catalog field, and in the PDF and snapshot entries. I'm guessing that's a holdover?

The other, also minor, issues, seem to be down to the site's RIS files, but I wonder if they can be overridden? It looks like none of the book chapters include the author's name(s), e.g. http://epubs.siam.org/doi/abs/10.1137/1.9780898719123.ch1 Annoyingly, it doesn't look like the author's names actually appear anywhere in the chapter pages' source. I wonder if it would be possible to pull it from the ISBN, which is on the page? (Which won't do any good if the chapters have separate authors, but I don't think many SIAM books do.)

The RIS also doesn't differentiate between book sections and conference proceedings, e.g. http://epubs.siam.org/doi/abs/10.1137/1.9781611972863.1 Probably not a big deal. I don't think there's any way to differentiate based on URL. Could probably use the word "Proceedings" in the text of the page, since it seems to appear in the title of all the collections of proceedings.

The journal article dates appear accurate, but book sections are (arbitrarily, I think?) being given January 1 of their respective years. Unlikely to matter for citation purposes.

Still, thanks again, though.

adamsmith · April 17, 2014

(if we want to supplement author data we'd probably want to use DOI not ISBN - there are, in fact, chapters from edited volumes on that page. But really I think that should be fixed in the data and I'd contact SIAM about it)

aurimas · April 17, 2014

if we want to supplement author data we'd probably want to use DOI not ISBN - there are, in fact, chapters from edited volumes on that page.

While we started doing this for amazon.com, I really don't think doing this as we're scraping the page is a good idea in general.

What we should be aiming for is (if the users chooses) to automatically fetch more complete metadata via DOI/ISBN/PMID/publisher metadata (possibly including attachments via OpenURL resolvers) after the import is done from the page. This should happen in the background. Obviously it's a large undertaking, but I think we might be ready to implement this and it seems that the need is starting to arise more and more.

duvane5678 · April 17, 2014

What we should be aiming for is (if the users chooses) to automatically fetch more complete metadata via DOI/ISBN/PMID/publisher metadata...

That would certainly be fantastic. Since merging was added, I have often grabbed both the native translator entry and the DOI entry and merged the two to deal with situations like this one. Tedious, but better than typing it all. Automating that would be great.

adamsmith · April 17, 2014

books should now work as well. If you (duvane) coulde report the RIS issues to SIAM that'd be great.

aurimas · April 17, 2014

Made multiples handling more flexible and fixed the other nits mentioned by duvane (except for authors, which is... probably impossible as a general fix)

duvane5678 · April 17, 2014

Maybe its just on my end, but I don't seem to be getting the latest version of the translator, so not picking up the last changes. The preferences page says the translators are up to date, and I've restarted firefox. I'm showing 12:27PM today as the last modified time on the translator file.

adamsmith · April 17, 2014

no, that seems to be a general issue with translator updating
@Dan - could you take a look?

dstillman · April 18, 2014

What's the issue, exactly? I just updated and seem to have the latest versions.

adamsmith · April 18, 2014

I'm stuck with this version:
https://github.com/zotero/translators/commit/4edc340201f0ff1d68f80f164ccb58f96d6ad25a
tried reset translators and update.
Debug for update translators is
D1756075641

dstillman · April 18, 2014

Can you provide a Debug ID for a reset? When I reset it immediately pulls down the new version of that file.

adamsmith · April 18, 2014

D1494048140
it does pull down a new version of the Atypon translator on reset (as the debug shows), but it's the version I link to above, which is two commits behind the most recent version.

dstillman · April 18, 2014

Oh, I would guess the problem was that you updated the timestamp to an earlier time from Aurimas's last commit (so it went "2014-04-17 15:29:18" -> "2014-04-17 11:45:25" -> "2014-04-17 14:30:53"). I don't remember the exact logic used on the repo side of things, but I would imagine that the timestamp is assumed to be monotonically increasing. (It's possible that could be fixed, but a good convention is just to always use current UTC time.)

dstillman · April 18, 2014

In any case, if you bump that timestamp again it should be fine.

adamsmith · April 18, 2014

Thanks for tracking that down. Will do tomorrow, unless Aurimas gets to that tonight. One of the reasons I like working in Scaffold is that I don't like touching timestamps, but Scaffold auto-inserts local timestamps, so that seems to be part of the problem here (it also seems like Aurimas's 2nd commit has an earlier timestamp than the 1st one which is rather odd).

adamsmith · April 18, 2014

OK, should update now for everyone.

duvane5678 · April 18, 2014

Yep, I've got it. It didn't occur to me before, but what about making the book main pages (http://epubs.siam.org/doi/book/10.1137/1.9780898719123) multiples? I know there are some translators that go each way on that.

aurimas · April 18, 2014

I've never been convinced one way or another about this. I don't use "multiples" very often myself, so I don't see as much value in it (but I'm sure others may). In this case, given the lack of author metadata in chapters, I think it's more valuable to be able to import the book metadata.

adamsmith · April 18, 2014

agree with aurimas. It's also safer across implementations - I could definitely see books without chapters on a platform and then we'd be left with an error (or the need to code an elaborate fail routine).

duvane5678 · April 18, 2014

It's also safer across implementations

That's fair enough; it is more robust this way across multiple sites.

I don't use multiples that much for journal articles, but I use it all the time for books. I would pretty much never cite a full book in a paper; it would almost always be a book chapter/section. (Of course, I'm sure the situation is different in other fields.) This is especially true for edited books with different chapter authors, where I frequently cite multiple chapters from the same book. So if I'm grabbing chapters, I typically just grab them all--it's quicker if I need multiple chapters anyway, and that way they're there later if I need them.

I think the ideal case would be to have both available, via right-click like we can do when multiple translators are available. The implementation of that might be more trouble than its worth, though.

adamsmith · April 18, 2014

the right click option is not available for every browser, so we need to make the more universal choice for the translator. If SIAM deposited per-chapter data with CrossRef, though, you could do right-click and import chapter info via DOIs.

aurimas · April 18, 2014

the right click option is not available for every browser, so we need to make the more universal choice for the translator

Yes, we really do. Particularly for https://github.com/zotero/translators/issues/686 (edit: well, in this case, not so much for the translator, but whether we want to download the page itself or the multiples on it)