Accented characters don't appear corectly in extracted annotation with zotfile

edited February 21, 2022
Hi,

When extracting annotations from a pdf with zotfile, from time to time accented characters are not correctly exported.
I give an simple example of this behavior with the pdf file from this link: https://www.scielo.br/j/rbef/a/j8y7vZt69DpS5kKYZWyV5Yz/?format=pdf&lang=pt

If I annotate the title: "Tradução comentada de um clássico de Copérnico", I get the following extracted annotation:

"Traduc òao comentada de um cl ¥assico de Cop ¥ernico" (Dias 2004:195)

Is there is a way to correct this behavior with pdf files that behaves this way with zotfile annotation extraction?

I use zotero 5.0.96.3 on archlinux ( zotero 6 seems not to be yet stable?)
  • dstillman Zotero Team
    edited February 21, 2022
    We can't help with ZotFile, but you can try the Zotero beta, which replaces ZotFile's annotation extraction with its own support for adding annotations to notes.
  • Basically the same thing in the beta with that PDF:
    “Traduc∏ ò ao comentada de um cl¥ assico de Cop¥ ernico”

    Also doesn't copy cleanly from pdf.js
  • edited February 21, 2022
    Ok, dstillman. I will look if I can test the beta version. In zotero beta, "Annotations are stored in the Zotero database, not in the PDF file". Is there a standart way to transiotion from zotfile to zotero beta?

    Thank you adamsmith. Both zotfile and zotero beta version uses pdf.js.

    I don't know if something can be done on the pdf in order to "correct it".

  • It's a direct replacement for the extraction function, but as adamsmith says, the beta behaves similarly.

    It looks like that's just a problem with the embedded text in that PDF. I get this in macOS Preview:
    Traduc ̧a ̃o comentada de um cla ́ssico de Cope ́rnico
Sign In or Register to comment.