Annotation PDFs which are not OCR
In zotero, highlighting text and adding notes works well when you open up publishers PDFs. If you open a PDF, which does not have “optical character recognition”, you can still highlight the text, but in the note, which zotero is making, a combination of letters and numbers is popping up, which creates “noise”. You can delete this information in “Edit highlighted text”, but it is a tiresome work, if you highlight a lot. Would it be possible, that you could develop a solution, which made it possible to switch off the automatic note function, when it comes to PDFs, which are not OCR? Thank you for all your work.
Can you provide a link to a PDF where you're seeing this, or email it to support@zotero.org with a link to this thread? (We'll respond here.)
I don't know if that's a bug or intentional, but you should report it to them. There's no reason for a PDF to have a text layer if it's not going to have text. There are plenty of scanned PDFs that are just images.
I don't think it's really the job of Zotero or any other PDF reader to try to detect this.
We could look into detecting this and just creating an annotation without highlight text, but I'm not sure there's any way to predict the different kinds of gibberish different publishers might put in. Maybe if it's all just a repeated character plus spaces, as may be the case here…