RIS import bug

jwm · August 5, 2010

On importing from RIS files, (Zotero 2.07), I get the following as page numbers:
"1833, 1823" when I want "1823-1833".

I think this is a bug in RIS.js around line 239.

if(value) {
if(!item.pages) {
item.pages = value;
} else if(value != item.pages) {
item.pages += "-"+value;
}

I think this should read
if(value) {
if(!item.pages) {
item.pages = "-"+value;
} else if(value != item.pages) {
item.pages += "-"+value;
}

any ideas?

James

jwm · August 5, 2010

oops, I meant version 2.03 of Zotero.

noksagt · August 5, 2010

What does the RIS you are importing look like?

jwm · August 5, 2010

This is an extract of the RIS of the thing I am importing (as output by Citeulike). I wonder if zotero is getting confused because the EP (end page) record is before the SP (start page) record. According the RIS format specification, TY must be first and ER must be last, but otherwise the tags can be in any order.

I have modified my own RIS.js to fix this, but would like confirmation from others that this is a problem, and for it to be fixed in a future release.

TY - JOUR
ID - citeulike:6495796
L3 - citeulike-article-id:6495796
N2 - The three-dimensional structure and dynamics of de novo designed, amphiphilic four-helix bundle peptides (or "maquettes"), capable of binding metallo-porphyrin cofactors at selected locations along the length of the core of the bundle, are investigated via molecular dynamics simulations. The rapid evolution of the initial design to stable three-dimensional structures in the absence (apo-form) and presence (holo-form) of bound cofactors is described for the maquettes at two different soft interfaces between polar and nonpolar media. This comparison of the apo- versus holo-forms allows the investigation of the effects of cofactor incorporation on the structure of the four-helix bundle. The simulation results are in qualitative agreement with available experimental data describing the structures at lower resolution and limited dimension.
IS - 7
JF - The journal of physical chemistry. B
SN - 1520-6106
EP - 1833
TI - Three-dimensional structure and dynamics of a de novo designed, amphiphilic, metallo-porphyrin-binding protein maquette at soft interfaces by molecular dynamics simulations.
VL - 111
SP - 1823
KW - maquette_paper
AU - Zou, Hongling
AU - Strzalka, Joseph
AU - Xu, Ting
AU - Tronin, Andrey
AU - Blasie, Kent
PY - 2007/02/22/
UR - http://dx.doi.org/10.1021/jp0666378
ER -

noksagt · August 5, 2010

This has been discussed before, but I don't remember where...

I have modified my own RIS.js to fix this, but would like confirmation from others that this is a problem, and for it to be fixed in a future release.

The current version of the translator allows for the import of discontinuous pagination, a'la:

SP  - 476
EP  - 481
SP  - 483
EP  - 485

It may be that the smart thing to do would be to treat cases where "SP" followed "EP" and to then check of the SP was lower than the EP it followed. Though this would have to a bit more clever if we wanted it to work for page numbers that consisted of something more than just arabic numerals.

jwm · August 6, 2010

OK, thanks for the reply. So is this being filed as a bug report with the developers?

noksagt · August 6, 2010

Multiple developers read every post here. Since SP/EP take alpha-numeric strings & my solution is not a complete one and since CiteULike is one of the few data sources to put these tages in opposite order, perhaps we should be importing BibTeX from CiteULike as a work-around?

ajlyon · August 6, 2010

What about holding on to EPs and appending them to SPs as they come?
Something like, in my awkward pseudocode:

for each tag => value:
  if tag is EP:
    if (sps.length() > 0):
      ranges.push(sps.pop()+ "-" + value)
    else
      eps.push(value)
  if tag is SP:
    if (eps.length() > 0):
      ranges.push(value + "-" + eps.pop())
    else:
      sps.push(value)
ranges.push(sps.join(", "))
ranges.push(eps.join(", "))
item.pages = ranges.join(", ")

The RIS spec doesn't give much guidance on the use of SP and EP, so it's hard to know how we should interpret mispaired SP and EP tags, but I don't think we should be responsible for making sure they are logically ordered (lower page numbers to higher page numbers).

noksagt · August 6, 2010

Don't know if I agree with your pseudocode, but I had pointed out that it won't work for, e.g. roman numerals & other non-arabic-numeral page numbers.

jwm · August 6, 2010

I have emailed CiteuLike support to suggest the in their output, SP is before EP.

jwm · August 7, 2010

CiteuLike have modifed their RIS export to put SP before EP.

>I've tweaked the RIS export to put SP before EP.
from [support09@citeulike.org]

ajlyon · August 7, 2010

Great to hear that they were receptive to your feedback.