character encoding issue/bug

edited June 19, 2019
Hi, I'm not sure how/where to report this.

I've imported a paper with what seems to be a — in the title.

Zotero apparently doesn't know it so it displays it as: &#x 2014; (without space in middle)


Regards
  • Where did you import from?
  • Drag n drop
  • edited June 19, 2019
    You're dragging a PDF, you mean? And the entity appears in the recognized parent item?

    Can you provide a Debug ID for this operation?
  • I've just left the office. Yes when I go to "extract Metadata from parent" (or similar) it updates the titles with the html code. I will provide the ID tomorrow.

    Thank you
    Regards
  • No, "Rename File from Parent Metadata" is the opposite — applying the parent item's title (which presumably already has the entity) to the child attachment's filename. If I'm understanding you, what likely happened here was that Zotero looked up metadata for the PDF you added and the metadata provider provided an incorrectly encoded value. But we'd need to see the Debug ID for adding the recognition process to say for sure.
  • edited June 20, 2019
    debug ID: D943275874

    I just drag n drop the pdf in the folder and zotero automatically updates the title but with the html code (so it does this automatically before I do anything else)

    Here's a link to the pdf (just in case) https://www.researchgate.net/publication/254026534_Self-motion_illusions_vection_in_VR_-_Are_they_good_for_anything
  • Technical note:
    this is what we get from CrossRef
    <title><![CDATA[Self-motion illusions (vection) in VR &#x2014; Are they good for anything?]]></title>

    I think we should be able to fix this on import, although it's unclear why they're escaping the em dash. Notably, though, CrossRef also converts this incorrectly to CSL JSON, so they don't expect the escaped em dash either.
  • edited June 20, 2019
    Zotero's behavior is correct and shouldn't be changed. Text within a CDATA section should be interpreted literally. The underlying data will need to be fixed — either by the publisher that submitted it to Crossref or by Crossref if it's something they're doing in their processing. (I'm not sure why it's correct in Crossref's search results despite being escaped in the CSL JSON.)
Sign In or Register to comment.