Built-in PDF reader doesn't recognize ligatures
Report ID: 1472876020
Sometimes when I import a PDF into Zotero and use the built-in PDF reader to make highlights/copy text, ligatures will not be highlighted/copied.
Example using the same PDF (an epub that was converted to PDF by Calibre, downloaded from gutenberg.org):
"In the Meno, Anytus had parted from Socrates with the signi cant words:" (Zotero)
"In the Meno, Anytus had parted from Socrates with the significant words:" (Okular)
After a brief search of the forums I found a post of someone having the same issue (1), and after a second search I found another post (2), however this did not solve my problem.
Forum posts:
1: https://forums.zotero.org/discussion/comment/447266#Comment_447266
2: https://forums.zotero.org/discussion/99950/ligatures-are-not-copied-from-pdf-in-zoteros-pdf-viewer-report-id-865672033/p1
I've ruled out the problem being certain PDF's because it doesn't matter if the PDF was downloaded as-is, was a conversion from epub to PDF in Calibre, or was just a web article saved using Ctrl+P. In fact, when converting epub to PDF, I get the issue when I use a font such as Bookerly (which combines "f" and "i"), but I don't get the issue with a font such as Georgia (which does not combine "f" and "i").
Curiously, this is a problem with the Zotero PDF reader, Safari, Firefox, and MacOS Preview. I've tried this with Okular, Google Drive, Kindle, and Google Chrome, and they all copy the text perfectly fine. Tested on my Windows desktop, my MacBook, and a Debian 12 desktop. I even tested this on the Zotero 7 beta and got the same result.
I'm happy to provide follow up answers, send a copy of the PDF I'm using as an example, or anything else that would help.
Thanks for the help and especially for all the work being done on this great program.
Sometimes when I import a PDF into Zotero and use the built-in PDF reader to make highlights/copy text, ligatures will not be highlighted/copied.
Example using the same PDF (an epub that was converted to PDF by Calibre, downloaded from gutenberg.org):
"In the Meno, Anytus had parted from Socrates with the signi cant words:" (Zotero)
"In the Meno, Anytus had parted from Socrates with the significant words:" (Okular)
After a brief search of the forums I found a post of someone having the same issue (1), and after a second search I found another post (2), however this did not solve my problem.
Forum posts:
1: https://forums.zotero.org/discussion/comment/447266#Comment_447266
2: https://forums.zotero.org/discussion/99950/ligatures-are-not-copied-from-pdf-in-zoteros-pdf-viewer-report-id-865672033/p1
I've ruled out the problem being certain PDF's because it doesn't matter if the PDF was downloaded as-is, was a conversion from epub to PDF in Calibre, or was just a web article saved using Ctrl+P. In fact, when converting epub to PDF, I get the issue when I use a font such as Bookerly (which combines "f" and "i"), but I don't get the issue with a font such as Georgia (which does not combine "f" and "i").
Curiously, this is a problem with the Zotero PDF reader, Safari, Firefox, and MacOS Preview. I've tried this with Okular, Google Drive, Kindle, and Google Chrome, and they all copy the text perfectly fine. Tested on my Windows desktop, my MacBook, and a Debian 12 desktop. I even tested this on the Zotero 7 beta and got the same result.
I'm happy to provide follow up answers, send a copy of the PDF I'm using as an example, or anything else that would help.
Thanks for the help and especially for all the work being done on this great program.
-
martynas_bCould you test if this issue still exists in Zotero 7 beta?
-
adamsmithThey did, see above
-
samvimesI can confirm that the problem persists in the Zotero 7 beta.
-
martynas_bAlright, please send example PDF files to support@zotero.org with a link to this thread.
-
martynas_bI reported the issue to PDF.js, and they seem to be aware of it, although they haven't made any progress yet. Let's wait until it's fixed.
-
PhoebeQI'm also experiencing this issue, would be very keen to hear any fixes