Zotfile is importing highighted text (with formatting errors) instead of correct test in annotations
Many pdf files have text formatting errors or bad OCR results - so when I make annotations I correct the errors. But when I try to extract the annotations into Zotero, it is not getting my corrected annotation, it is getting the messed-up original highlighted text. So I have to go thorough sometimes hundreds of annotations to correct the formatting again. How can I get Zotfile to only bring in my annotations, and ignore the original text?
When I create an annotation and fix the formatting it looks like this: "Simply to take the information out of the context in which it arose and use it generally does not solve the problem...".
Zotfile is not extracting the annotation, it is extracting the highlighted text. How can I make it extract the annotation only?
First, you need to understand that annotations in PDFs that follow the standard format (eg. Adobe Acrobat) and save your annotations directly into the metadata of the original PDF file (eg. not Skim, which apparently writes to a second file) are actually composed of two different pieces of metadata: the "markup" and the "content".
The "markup" part is a copy of the exact text that you either highlighted or underlined. There is no markup part for a Sticky Note since is just a free floating annotation.
The "content" part is a copy of your personal comment that you type. For instance, it is the part that you see on the right side of the screen in the Comments toolbar if you're using Adobe Reader.
So for rgfuller's example above, the markup is the messed up OCR text that he or she highlighted in the PDF and the content is his or her correctly retyped text in the comment sidebar.
For whatever reason, Zotfile currently works like this:
Sticky Notes: only the content part is extracted (which makes sense because sticky notes don't markup text)
Highlighting: both parts are extracted (maybe this was fixed after the comments above?)
Underlining: only the markup part is extracted. Extraction of annotations by Zotfile will not return your content (personal comment)!
The sort of good news is that if you can handle some text file editing and zip file manipulation, it is not too hard to switch the functionality between extraction of highlights and underlines to retrieve your personal comments from underlined annotations. Please see: https://forums.zotero.org/discussion/comment/348732. It is unfortunately not a fix to get both highlights and underlines to work as expected simultaneously, but it is a workaround to at least extract what you need.
That post also has a way to essentially turn off the extraction of the markup portion as requested by rgfuller above.
Hope that helps someone!