PDF annotation highlight text drops (some) spaces
Zotero 6.0.30 on Windows 10 (Home)
Selecting text with the Zotero PDF viewer to highlight and annotate, the quoted text drops spaces in some PDFs (this is the first time I have noticed this happening a lot in a single document)
Within Adobe reader, selecting the same region results in text that contains spaces.
Zotero: Machinesare definite: anythingwhichwas indefinite or infinite we shouldnot countas a machin
Adobe: Machines are definite: anything which was indefinite or infinite we should not count as a machine
Perhaps this already resolved in V7, but I note it for future reference.
Another example:
https://s3.amazonaws.com/zotero.org/images/forums/u1726147/ksg3w211o9w0rham7uch.png
Selecting text with the Zotero PDF viewer to highlight and annotate, the quoted text drops spaces in some PDFs (this is the first time I have noticed this happening a lot in a single document)
Within Adobe reader, selecting the same region results in text that contains spaces.
Zotero: Machinesare definite: anythingwhichwas indefinite or infinite we shouldnot countas a machin
Adobe: Machines are definite: anything which was indefinite or infinite we should not count as a machine
Perhaps this already resolved in V7, but I note it for future reference.
Another example:
https://s3.amazonaws.com/zotero.org/images/forums/u1726147/ksg3w211o9w0rham7uch.png
Lucas, J. R. (1961). Minds, Machines and Gödel. Philosophy, 36(137), 112–127.
https://doi.org/10.1017/S0031819100057983
https://www.jstor.org/stable/3749270
The problems I can see are mostly due to bad OCR. Making a new OCR on the file fixed the problems I have tested.
> Making a new OCR on the file fixed the problems I have tested.
What OCR do you recommend for Windows 10, and how do I replace the OCR'd text in an existing PDF?
More significantly, whilst the cause might be clear, this is not a problem that >=1 other PDF reader has, so it could be addressed in the PDF handler, couldn't it? That would help everyone who might have "OCR" issues.
You do not mention any specific problem in the second example you give.
For the specific problem you mention in the first post (guessing that it is the correct paper), the paper has two different versions published by CUP or JSTOR.
* The CUP version is fine is both Adobe Acrobat Pro and Zotero 7. So if that is the version you are using, it is already fixed in Zotero 7.
https://s3.amazonaws.com/zotero.org/images/forums/u265723/roatznaezcnuo5079gxs.png
* For the JSTOR version, I cannot observe the problem you mention about the missing spaces. So if that is the version you are using, it is already fixed in Zotero 7.
It still has problems:
- Zotero 7:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/vabjgy3i5917xbgvgp9q.png
- Adobe Acrobat Pro (I have copied the text produced by the selection in the comment):
https://s3.amazonaws.com/zotero.org/images/forums/u265723/el2dcadt0cq9tbgfdqmy.png
After OCR (I have used Acrobat Adobe Pro, but there are probably other ways to do it):
- Zotero 7:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/a7kkuu2al37w2xa8hgvv.png
- Adobe Acrobat Pro (I have copied the text produced by the selection in the comment):
https://s3.amazonaws.com/zotero.org/images/forums/u265723/w062foqrbe4tzwjr7g3r.png
I have put "mostly" because the PDF Reader in Zotero could still have problems not observed in other readers. If you find such problems, please give precise examples. The only problems I could see are also observed in other PDF readers.
So you can use Zotero 7 if you need to fix the problems you see in Zotero 6.