Duplicate highlight annotations

edited April 15, 2022
I'm having a problem where Zotero's built-in annotation extraction is duplicating the text from all my highlights.

Long-time Zotero+Zotfile user here. Extracting text from highlight annotations in a PDF with Zotfile used to only be possible with 3rd party PDF readers that supported adding the selected text into the highlight annotation (eg. PDF X-Change, Acrobat, etc.). I used this feature across hundreds of PDFs.

I am pleased to see that Zotero is beginning to natively support PDF annotations and annotation extraction. Great! I hate Acrobat. However, when I extract the annotations from old PDFs, previously annotated with 3rd party software, in Zotero I get the text from the highlight twice per annotation. This does not happen when annotating with the built-in reader. I suspect it has something to do with how Zotfile previously required 3rd party readers to copy the selected text into the highlight.

I am happy to transition to Zotero's native annotation system and I understand that 3rd party annotation is still available if necessary. However, I don't think I can remove the duplicated text without going through each PDF, turning off the copy text to highlight feature and re-highlighting everything. This is somewhat exacerbated by the removal of Zotfile's old annotation extraction feature, which I could have continued using for older PDFs.

Example:
“As O’Reilly discovered a long time ago, memes are for losers; the real money is in epistemes.” As O’Reilly discovered a long time ago,memes are for losers; the real money is in epistemes.

This bug/feature does not ruin anything but it does clutter my extracted annotation notes quite a lot. Could there be a way to tell Zotero to ignore text that has been automatically copied into highlights when extracting annotations?

Many apologies if this issue has already been mentioned in the forums, but I could not find a reference to it.
  • This is planned.

    To be clear, this is in no way a bug — your external PDF reader is doubling the highlighted text into the field intended for comments, and Zotero is dutifully importing it as it would any user-created comment. You should turn that setting off going forward.
  • Thank you for your reply. I'm very glad to hear there is a planned workaround. As you suggested, I have turned that feature in PDF-Xchange off, as it was only necessary for Zotfile's now defunct annotation extraction. Of course that does not solve the issue with the vast digital collection I already annotated with that feature turned on.
  • Can I suggest you just delete all the comments before using Zotero's import function? I have a memory of Acrobat Pro's batch processing features (I no longer subscribe so can't check), but it looks like there is other PDF software that will do it too: https://www.qoppa.com/files/pdfstudio/guide/index.htm#t=batch-delete-all-comments.htm
  • Hello joycekwc, thank you for replying too. Does the method you suggest only remove the notes and keep the highlights? The page you linked to suggests it removes all types of annotations in a file. Sadly, that does not really help me. Again, we're talking about hundreds of book-sized PDFs, which I cannot simply re-annotate from scratch.
    If it does only remove notes, then there is a new problem. I have important notes in those digital editions too, not just highlights and I would rather not batch-delete those along with the duplicates.
    I think I used to believe that the "copy selected text into highlight..." feature of Acrobat, PDF X-Change, and Foxit was necessary for Zotfile to extract highlights. Looking into it now, I cannot find evidence online of that being true. Either that or I wanted to preserve the annotation summary feature that some of those programs have. The damage is done now and hopefully I'm in a very small minority here. If Zotero offers an option to remove those duplicates at some point, I would be very happy but I can also live with my mistakes.
  • In most PDF editors, comments are different from highlights - certainly Acrobat Pro sees them as different types. There’s a ton of pdf editors out there so if there first one I linked to removed all annotations, then there’s bound to be one that will just delete comments. This one for example seems to: https://pdf.wondershare.com/how-to/remove-comments-from-pdf.html

    For the issue of your own notes, you can just store two copies of the pdf as attachments - one with all your notes and one with all comments deleted. If you wanted to for a particular you could probably isolate your own notes - in Acrobat Pro I remember you could filter by type of annotation, although I don’t think this could be a batch process. Or you might be able to change the store annotations option somehow.

    Ultimately the issue is probably one that can be resolved by a PDF editor, rather than through Zotero.
  • Thank you again for trying to help @joycekwc. To be clear, I don't want to pick away at what you suggest but I feel I need to make this clear for anyone else who is trying to find a solution and discovers this thread: that approach does not work.
    As you correctly point out, notes are different from highlights in PDF documents and not everybody is aware of that distinction when discussing them. The author of that Wondershare article you linked to is either not aware of that distinction herself, or she is deliberately using the slightly more ambiguous term "comment". What she describes in that article is a method that deletes all annotations without distinction. Sure, you could manually select each note you want to delete with the CTRL key, while keeping the ones you like, but that is a manual solution most will be aware of anyway and is impractical for a big collection.
    In addition to that issue, the case that I am describing is not one where the highlight and the note attached to it can be selected independently prior to deletion. I wish I could attach a screenshot, but this link to a Docear manual discussing the "copy selected text into highlight" feature will have to do. Just scroll beneath the table to see screenshots of what that looks like and note that there is a highlight with text in a note window linking to it. There is no separate note selectable (the small icon above the highlight is not separate and just indicates that the highlight contains a note). What this means is that if you click the highlight in a PDF editor/reader like Wondershare and press delete (as described in that article you mentioned), it deletes the highlight and the attached note. The only way to delete the note on its own is to select the highlight, access the note text, select it and hit delete. That is obviously too much work for the literally thousands of highlight notes that are duplicated in my library. I have tested this in Wondershare, so nobody else has to. I have also tested it in PDF X-Change, Foxit, Xodo, Drawboard, and Acrobat DC. The results are mostly the same (Acrobat DC does have an annotations filter but it does not work for this), but feel free to verify.
    As far as I can tell, no PDF editor is able to automatically delete all the notes from highlights, whilst keeping the highlights intact. I'm still holding out hope that there is a solution on Zotero's end because the extract annotations feature from Zotfile that it is supplanting was able to ignore the duplicate text from the highlight notes.
  • Something good did come out of all that testing though and I thought I should put it in a separate reply, so it does not get lost in the above. I now know why I made sure that the highlighted text was always copied into the highlight as a note. In many PDF readers/editors, including Zotero's, you can see a side panel for the PDF you are reading that shows a summary of all your annotations. I use this all the time for long PDFs I have annotated, to skip ahead to the important stuff. In Zotero, you always see the text from the highlights you have made in that panel, so you can jump to the highlight that you are looking for immediately. In many other editors however, all you see is the word "highlight" with a page reference and maybe an annotation colour (tested in all the software mentioned above). However, if you use the "copy selected text into highlight…" feature from Acrobat, PDF X-Change, and Foxit, then you can see the text for the highlight in that summary. This makes that feature indispensable, even if it duplicates all the text from my highlights when extracting in Zotero (this did not happen with Zotfile).
    The obvious answer would be to just use Zotero's PDF reader then. Except that I cannot edit my older annotations that were made in third party editors with Zotero. Sometimes, I want to edit a note I made or delete an annotation altogether. Sometimes a highlight needs a slight edit. I don't like the idea of juggling my workflow between Zotero's annotations and a third party editor, especially when Zotero's database-layer annotations are not visible in a different program. Yes, I can save those annotations to the PDF file each time (File>Store_Annotations_In_File), but in a sense I am then creating more work for myself by going through that extra step and then committing to having to use a third party editor for those newly created file-side annotations.
    I have noted that other users on different threads have also expressed a desire for Zotero's PDF reader to be able to edit annotations in the PDF file itself, rather than just in the database layer. I have read Zotero's literature on the advantages with that approach and I understand why those choices were made. However, if I had the option to select between them, I think I would pick the embedded PDF annotations for the way in which I work. Are there any plans to implement this as an optional feature in the future, or are we stuck with 3rd party software for embedded annotations?
    Yes, I know you can select File>Import_Annotations in Zotero's PDF reader, thus theoretically enabling you to edit embedded annotations made externally, but from what I understand, that feature will be removed in the future. Even if not, it means going through that process in combination with the above one each time you annotate a digital edition in your library and if you forget to, you'll eventually lose track of which files have all their annotations embedded and which don't. At least I will. An option to enable embedded annotations in Zotero's PDF reader would be really welcome.
  • even if it duplicates all the text from my highlights when extracting in Zotero
    This is fixed in the current beta and an upcoming stable release.
    Yes, I know you can select File>Import_Annotations in Zotero's PDF reader, thus theoretically enabling you to edit embedded annotations made externally, but from what I understand, that feature will be removed in the future.
    No, that article doesn't say that File -> Import Annotations... will be removed. The ability to import your existing PDF annotations is a core feature of Zotero's PDF reader. The removed option that the article mentions is File -> Store Annotations in File, which caused a lot of confusion and is rarely necessary.

    If you'd like to pull in old annotations made with other programs but use the Zotero PDF reader from now on, run File -> Import Annotations... once. If you'd like to use an external reader, change the default in the Preferences and everything will work just like it did in Zotero 5.
  • Hello @AbeJellinek,
    That's great news! I'll definitely try out the beta then. Thank you.
    The removed option that the article mentions is File -> Store Annotations in File, which caused a lot of confusion and is rarely necessary.
    Thank you for correcting me on that. You're quite right and I was confused, I apologise. I'm still a little confused because the final section of that article points to the continued importance of data portability. Does that mean there will be a different way in which we can batch store annotations in all our PDF files once File→Store Annotations... has been removed? If so, how?
  • @Spameggs: The page explains all the ways you can export PDFs with embedded annotations. If you're using the Zotero PDF reader, annotations are stored in the database. As the page explains, if that doesn't work for you, you can use an external PDF reader.

    You can post to this thread if you have further questions on this.
  • Ah, OK. So it remains functional as an export feature and you can't actually embed the annotations into the original PDF going forward. I suppose you could theoretically export, delete the original and then replace it with the exported file for each item, if you have the time for that.
    I see how much of a sticky wicket it must be to develop and introduce this new annotation system. You have clearly thought about it a lot and I look forward to seeing where it goes.
    Thank you very much for addressing the duplicate highlight notes in the future release! That will hopefully solve the difficulties I've been having with the current version.
  • Update: The issue is mostly resolved in Zotero 6.0.5-beta.5+ee10d3330, as @AbeJellinek said. Brilliant!! Thank you.

    The way in which the extractor ignores the duplicates seems to be based on exact duplications, so if the PDF text is a little screwy (as in the internet article printout example below), then the duplication still occurs because the body text and the text in the highlight note are somehow different (I used Acrobat in the example). I think I'm fine with that still happening. A setting that tells the extractor to ignore all notes within highlights would also have issues, after all.
    “Nicole had experienced a panic attack aer 10 months of watching her mother disappear into her phone screen, and a world of conspiracy theories. Specifically, QAnon - an expansive movement that has inspired protests, split families and continues to find new followers online.” (“The moment QAnon took the person I love most”, 2021, p. 1) Nicole had experienced a panic attack aft er 10 months of watching her motherdisappear into her phone screen, and a world of conspiracy theories.Specifi cally, QAnon - an expansive movement that has inspired protests, splitfamilies and continues to fi nd new followers online.
  • The way in which the extractor ignores the duplicates seems to be based on exact duplications
    It's not, but there are apparently enough differences in that PDF to prevent those from matching. If you email the PDF to support@zotero.org with a link to this thread, we can take a look.
Sign In or Register to comment.