Extra space before subscripts in title when importing items via DOIs
## Description
When I import the item via DOI (10.1021/acs.iecr.4c03239), I get title:
But in reality it should be:
There should be no space between CO and 2.
## Reason
Data returned by CrossRef does not contain spaces:
JSON format:https://api.crossref.org/works/10.1021/acs.iecr.4c03239
https://s3.amazonaws.com/zotero.org/images/forums/u7424907/41bghhjzj2zlhcaughof.png
XMLformat:https://www.crossref.org/openurl/?pid=northword@outlook.com&format=unixref&id=doi:10.1021/acs.iecr.4c03239&noredirect=true
https://s3.amazonaws.com/zotero.org/images/forums/u7424907/2abtmbtuao4xsvq8hgrb.png
However, the data returned by the XML api, when converted to a string, will contain spaces (perhaps for indentation?)
The string data passed to "crossref unixref xml.js" is
These extra spaces are replaced with 1 space by
## Possible solutions
It is extremely rare for a subscript to appear at the beginning of a word, is it possible to replace
Also, the title in the CrossRef Rest is correct and contains no spaces, can we request to both XML and Rest, using the Rest result for the title and the XML result for the other missing fields?
## Others
Debug ID: D1996284446
When I import the item via DOI (10.1021/acs.iecr.4c03239), I get title:
Investigation of CO <sub>2</sub> Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling
But in reality it should be:
Investigation of CO<sub>2</sub> Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling
There should be no space between CO and 2.
## Reason
Data returned by CrossRef does not contain spaces:
JSON format:https://api.crossref.org/works/10.1021/acs.iecr.4c03239
https://s3.amazonaws.com/zotero.org/images/forums/u7424907/41bghhjzj2zlhcaughof.png
XMLformat:https://www.crossref.org/openurl/?pid=northword@outlook.com&format=unixref&id=doi:10.1021/acs.iecr.4c03239&noredirect=true
https://s3.amazonaws.com/zotero.org/images/forums/u7424907/2abtmbtuao4xsvq8hgrb.png
However, the data returned by the XML api, when converted to a string, will contain spaces (perhaps for indentation?)
The string data passed to "crossref unixref xml.js" is
<title>\n Investigation of CO\n <sub>2</sub>\n Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling\n </title>\n
These extra spaces are replaced with 1 space by
ZU.trimInternal
, but ideally no spaces should be included here.## Possible solutions
It is extremely rare for a subscript to appear at the beginning of a word, is it possible to replace
/ +<sub>/g
with <sub>
?Also, the title in the CrossRef Rest is correct and contains no spaces, can we request to both XML and Rest, using the Rest result for the title and the XML result for the other missing fields?
## Others
Debug ID: D1996284446
-
adamsmithI see little downside to removing spaces before sub and superscript in the translator. People already run into rate limiting with CrossRef, so I don't think multiple requests are a good idea
-
AbeJellinekThis basically looks like a CrossRef bug - it should be putting titles that contain HTML tags into a <![CDATA section, not treating them as XML tags, and it definitely shouldn't be "beautifying" the titles and adding spaces that aren't there in the source metadata. We can remove the spaces as a quick fix, but I'd prefer to see if this can be fixed on CrossRef's end.