Duplicate highlight annotations
I'm having a problem where Zotero's built-in annotation extraction is duplicating the text from all my highlights.
Long-time Zotero+Zotfile user here. Extracting text from highlight annotations in a PDF with Zotfile used to only be possible with 3rd party PDF readers that supported adding the selected text into the highlight annotation (eg. PDF X-Change, Acrobat, etc.). I used this feature across hundreds of PDFs.
I am pleased to see that Zotero is beginning to natively support PDF annotations and annotation extraction. Great! I hate Acrobat. However, when I extract the annotations from old PDFs, previously annotated with 3rd party software, in Zotero I get the text from the highlight twice per annotation. This does not happen when annotating with the built-in reader. I suspect it has something to do with how Zotfile previously required 3rd party readers to copy the selected text into the highlight.
I am happy to transition to Zotero's native annotation system and I understand that 3rd party annotation is still available if necessary. However, I don't think I can remove the duplicated text without going through each PDF, turning off the copy text to highlight feature and re-highlighting everything. This is somewhat exacerbated by the removal of Zotfile's old annotation extraction feature, which I could have continued using for older PDFs.
Example:
“As O’Reilly discovered a long time ago, memes are for losers; the real money is in epistemes.” As O’Reilly discovered a long time ago,memes are for losers; the real money is in epistemes.
This bug/feature does not ruin anything but it does clutter my extracted annotation notes quite a lot. Could there be a way to tell Zotero to ignore text that has been automatically copied into highlights when extracting annotations?
Many apologies if this issue has already been mentioned in the forums, but I could not find a reference to it.
Long-time Zotero+Zotfile user here. Extracting text from highlight annotations in a PDF with Zotfile used to only be possible with 3rd party PDF readers that supported adding the selected text into the highlight annotation (eg. PDF X-Change, Acrobat, etc.). I used this feature across hundreds of PDFs.
I am pleased to see that Zotero is beginning to natively support PDF annotations and annotation extraction. Great! I hate Acrobat. However, when I extract the annotations from old PDFs, previously annotated with 3rd party software, in Zotero I get the text from the highlight twice per annotation. This does not happen when annotating with the built-in reader. I suspect it has something to do with how Zotfile previously required 3rd party readers to copy the selected text into the highlight.
I am happy to transition to Zotero's native annotation system and I understand that 3rd party annotation is still available if necessary. However, I don't think I can remove the duplicated text without going through each PDF, turning off the copy text to highlight feature and re-highlighting everything. This is somewhat exacerbated by the removal of Zotfile's old annotation extraction feature, which I could have continued using for older PDFs.
Example:
“As O’Reilly discovered a long time ago, memes are for losers; the real money is in epistemes.” As O’Reilly discovered a long time ago,memes are for losers; the real money is in epistemes.
This bug/feature does not ruin anything but it does clutter my extracted annotation notes quite a lot. Could there be a way to tell Zotero to ignore text that has been automatically copied into highlights when extracting annotations?
Many apologies if this issue has already been mentioned in the forums, but I could not find a reference to it.
To be clear, this is in no way a bug — your external PDF reader is doubling the highlighted text into the field intended for comments, and Zotero is dutifully importing it as it would any user-created comment. You should turn that setting off going forward.
If it does only remove notes, then there is a new problem. I have important notes in those digital editions too, not just highlights and I would rather not batch-delete those along with the duplicates.
I think I used to believe that the "copy selected text into highlight..." feature of Acrobat, PDF X-Change, and Foxit was necessary for Zotfile to extract highlights. Looking into it now, I cannot find evidence online of that being true. Either that or I wanted to preserve the annotation summary feature that some of those programs have. The damage is done now and hopefully I'm in a very small minority here. If Zotero offers an option to remove those duplicates at some point, I would be very happy but I can also live with my mistakes.
For the issue of your own notes, you can just store two copies of the pdf as attachments - one with all your notes and one with all comments deleted. If you wanted to for a particular you could probably isolate your own notes - in Acrobat Pro I remember you could filter by type of annotation, although I don’t think this could be a batch process. Or you might be able to change the store annotations option somehow.
Ultimately the issue is probably one that can be resolved by a PDF editor, rather than through Zotero.
As you correctly point out, notes are different from highlights in PDF documents and not everybody is aware of that distinction when discussing them. The author of that Wondershare article you linked to is either not aware of that distinction herself, or she is deliberately using the slightly more ambiguous term "comment". What she describes in that article is a method that deletes all annotations without distinction. Sure, you could manually select each note you want to delete with the CTRL key, while keeping the ones you like, but that is a manual solution most will be aware of anyway and is impractical for a big collection.
In addition to that issue, the case that I am describing is not one where the highlight and the note attached to it can be selected independently prior to deletion. I wish I could attach a screenshot, but this link to a Docear manual discussing the "copy selected text into highlight" feature will have to do. Just scroll beneath the table to see screenshots of what that looks like and note that there is a highlight with text in a note window linking to it. There is no separate note selectable (the small icon above the highlight is not separate and just indicates that the highlight contains a note). What this means is that if you click the highlight in a PDF editor/reader like Wondershare and press delete (as described in that article you mentioned), it deletes the highlight and the attached note. The only way to delete the note on its own is to select the highlight, access the note text, select it and hit delete. That is obviously too much work for the literally thousands of highlight notes that are duplicated in my library. I have tested this in Wondershare, so nobody else has to. I have also tested it in PDF X-Change, Foxit, Xodo, Drawboard, and Acrobat DC. The results are mostly the same (Acrobat DC does have an annotations filter but it does not work for this), but feel free to verify.
As far as I can tell, no PDF editor is able to automatically delete all the notes from highlights, whilst keeping the highlights intact. I'm still holding out hope that there is a solution on Zotero's end because the extract annotations feature from Zotfile that it is supplanting was able to ignore the duplicate text from the highlight notes.
The obvious answer would be to just use Zotero's PDF reader then. Except that I cannot edit my older annotations that were made in third party editors with Zotero. Sometimes, I want to edit a note I made or delete an annotation altogether. Sometimes a highlight needs a slight edit. I don't like the idea of juggling my workflow between Zotero's annotations and a third party editor, especially when Zotero's database-layer annotations are not visible in a different program. Yes, I can save those annotations to the PDF file each time (File>Store_Annotations_In_File), but in a sense I am then creating more work for myself by going through that extra step and then committing to having to use a third party editor for those newly created file-side annotations.
I have noted that other users on different threads have also expressed a desire for Zotero's PDF reader to be able to edit annotations in the PDF file itself, rather than just in the database layer. I have read Zotero's literature on the advantages with that approach and I understand why those choices were made. However, if I had the option to select between them, I think I would pick the embedded PDF annotations for the way in which I work. Are there any plans to implement this as an optional feature in the future, or are we stuck with 3rd party software for embedded annotations?
Yes, I know you can select File>Import_Annotations in Zotero's PDF reader, thus theoretically enabling you to edit embedded annotations made externally, but from what I understand, that feature will be removed in the future. Even if not, it means going through that process in combination with the above one each time you annotate a digital edition in your library and if you forget to, you'll eventually lose track of which files have all their annotations embedded and which don't. At least I will. An option to enable embedded annotations in Zotero's PDF reader would be really welcome.
If you'd like to pull in old annotations made with other programs but use the Zotero PDF reader from now on, run File -> Import Annotations... once. If you'd like to use an external reader, change the default in the Preferences and everything will work just like it did in Zotero 5.
That's great news! I'll definitely try out the beta then. Thank you. Thank you for correcting me on that. You're quite right and I was confused, I apologise. I'm still a little confused because the final section of that article points to the continued importance of data portability. Does that mean there will be a different way in which we can batch store annotations in all our PDF files once File→Store Annotations... has been removed? If so, how?
You can post to this thread if you have further questions on this.
I see how much of a sticky wicket it must be to develop and introduce this new annotation system. You have clearly thought about it a lot and I look forward to seeing where it goes.
Thank you very much for addressing the duplicate highlight notes in the future release! That will hopefully solve the difficulties I've been having with the current version.
The way in which the extractor ignores the duplicates seems to be based on exact duplications, so if the PDF text is a little screwy (as in the internet article printout example below), then the duplication still occurs because the body text and the text in the highlight note are somehow different (I used Acrobat in the example). I think I'm fine with that still happening. A setting that tells the extractor to ignore all notes within highlights would also have issues, after all.