character encoding issue/bug

FSoave · June 19, 2019

Hi, I'm not sure how/where to report this.

I've imported a paper with what seems to be a — in the title.

Zotero apparently doesn't know it so it displays it as: &#x 2014; (without space in middle)

Regards

dstillman · June 19, 2019

Where did you import from?

FSoave · June 19, 2019

Drag n drop

dstillman · June 19, 2019

You're dragging a PDF, you mean? And the entity appears in the recognized parent item?

Can you provide a Debug ID for this operation?

FSoave · June 19, 2019

I've just left the office. Yes when I go to "extract Metadata from parent" (or similar) it updates the titles with the html code. I will provide the ID tomorrow.

Thank you
Regards

dstillman · June 19, 2019

No, "Rename File from Parent Metadata" is the opposite — applying the parent item's title (which presumably already has the entity) to the child attachment's filename. If I'm understanding you, what likely happened here was that Zotero looked up metadata for the PDF you added and the metadata provider provided an incorrectly encoded value. But we'd need to see the Debug ID for adding the recognition process to say for sure.

FSoave · June 20, 2019

debug ID: D943275874

I just drag n drop the pdf in the folder and zotero automatically updates the title but with the html code (so it does this automatically before I do anything else)

Here's a link to the pdf (just in case) https://www.researchgate.net/publication/254026534_Self-motion_illusions_vection_in_VR_-_Are_they_good_for_anything

adamsmith · June 20, 2019

Technical note:
this is what we get from CrossRef

<title><![CDATA[Self-motion illusions (vection) in VR &#x2014; Are they good for anything?]]></title>

I think we should be able to fix this on import, although it's unclear why they're escaping the em dash. Notably, though, CrossRef also converts this incorrectly to CSL JSON, so they don't expect the escaped em dash either.

dstillman · June 20, 2019

Zotero's behavior is correct and shouldn't be changed. Text within a CDATA section should be interpreted literally. The underlying data will need to be fixed — either by the publisher that submitted it to Crossref or by Crossref if it's something they're doing in their processing. (I'm not sure why it's correct in Crossref's search results despite being escaped in the CSL JSON.)