Extra space before subscripts in title when importing items via DOIs

northword · October 16, 2024

## Description

When I import the item via DOI (10.1021/acs.iecr.4c03239), I get title:

Investigation of CO <sub>2</sub> Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling

But in reality it should be:

Investigation of CO<sub>2</sub> Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling

There should be no space between CO and 2.

## Reason

Data returned by CrossRef does not contain spaces:

JSON format：https://api.crossref.org/works/10.1021/acs.iecr.4c03239

https://s3.amazonaws.com/zotero.org/images/forums/u7424907/41bghhjzj2zlhcaughof.png

XMLformat：https://www.crossref.org/openurl/?pid=northword@outlook.com&format=unixref&id=doi:10.1021/acs.iecr.4c03239&noredirect=true

https://s3.amazonaws.com/zotero.org/images/forums/u7424907/2abtmbtuao4xsvq8hgrb.png

However, the data returned by the XML api, when converted to a string, will contain spaces (perhaps for indentation?)

The string data passed to "crossref unixref xml.js" is

<title>\n                            Investigation of CO\n                            <sub>2</sub>\n                            Reduction to Formate in an Industrial-Scale Electrochemical Cell through Transient Numerical Modeling\n                        </title>\n

These extra spaces are replaced with 1 space by ZU.trimInternal, but ideally no spaces should be included here.

## Possible solutions

It is extremely rare for a subscript to appear at the beginning of a word, is it possible to replace / +<sub>/g with <sub> ?

Also, the title in the CrossRef Rest is correct and contains no spaces, can we request to both XML and Rest, using the Rest result for the title and the XML result for the other missing fields?

## Others

Debug ID: D1996284446

adamsmith · October 16, 2024

I see little downside to removing spaces before sub and superscript in the translator. People already run into rate limiting with CrossRef, so I don't think multiple requests are a good idea

AbeJellinek · October 16, 2024

This basically looks like a CrossRef bug - it should be putting titles that contain HTML tags into a <![CDATA section, not treating them as XML tags, and it definitely shouldn't be "beautifying" the titles and adding spaces that aren't there in the source metadata. We can remove the spaces as a quick fix, but I'd prefer to see if this can be fixed on CrossRef's end.