Accented characters don't appear corectly in extracted annotation with zotfile

edited February 21, 2022
Hi,

When extracting annotations from a pdf with zotfile, from time to time accented characters are not correctly exported.
I give an simple example of this behavior with the pdf file from this link: https://www.scielo.br/j/rbef/a/j8y7vZt69DpS5kKYZWyV5Yz/?format=pdf&lang=pt

If I annotate the title: "Tradução comentada de um clássico de Copérnico", I get the following extracted annotation:

"Traduc òao comentada de um cl ¥assico de Cop ¥ernico" (Dias 2004:195)

Is there is a way to correct this behavior with pdf files that behaves this way with zotfile annotation extraction?

I use zotero 5.0.96.3 on archlinux ( zotero 6 seems not to be yet stable?)
  • edited February 21, 2022
    We can't help with ZotFile, but you can try the Zotero beta, which replaces ZotFile's annotation extraction with its own support for adding annotations to notes.
  • Basically the same thing in the beta with that PDF:
    “Traduc∏ ò ao comentada de um cl¥ assico de Cop¥ ernico”

    Also doesn't copy cleanly from pdf.js
  • edited February 21, 2022
    Ok, dstillman. I will look if I can test the beta version. In zotero beta, "Annotations are stored in the Zotero database, not in the PDF file". Is there a standart way to transiotion from zotfile to zotero beta?

    Thank you adamsmith. Both zotfile and zotero beta version uses pdf.js.

    I don't know if something can be done on the pdf in order to "correct it".

  • It's a direct replacement for the extraction function, but as adamsmith says, the beta behaves similarly.

    It looks like that's just a problem with the embedded text in that PDF. I get this in macOS Preview:
    Traduc ̧a ̃o comentada de um cla ́ssico de Cope ́rnico
Sign In or Register to comment.