Zotfile doesn't extract all annotations

Hi all,

I've been using Zotero & Zotfile for a while and with a few exceptions of weird PDFs that it didn't like, Zotfile has mostly worked really well.

However, recently it started showing a weird bug: it does not reliably extract all annotations anymore. With some texts, it extracts all of them just fine, but with many others, it behaves weirdly and leaves out about 60-70% of them (roughly). It always catches the first bit of highlighted text on a page but usually not the last paragraph on the bottom of each text and only sometimes the only in the middle section of the page.

I've waited to report this until a new update came out but after updating Zotero yesterday, it's still doing the same thing today. I'm using Zotero 5.0.88 on a Mac and I use Preview as a software to read and highlight the PDFs.

Any advice would be very much appreciated :)
Thanks in advance!
  • ZotFile performs the extraction, not Zotero, so a Zotero update shouldn't make a difference. But ZotFile hasn't been meaningfully updated in a while, so I doubt this actually changed there either. It might just be a fluke with some of your recent files.

    You can also try tweaking the ZotFile settings to use poppler instead of or in addition to pdf.js — I don't know the details of that, but see the ZotFile documentation.
  • I can normally extract all my highlights from PDFs in Adobe Acrobat but today two files look like they are extracting and I get a green check mark to say it was successful. When I look at the note there is none of the highlighted text.
  • There are reports of issues with PDF highlights done in Preview, see here. On a Mac, you might get more reliable results with Skim, PDF Expert, or Acrobat. You can set a custom PDF viewer in Zotero's preferences: "Edit" -> "Preferences" -> "General" -> "Open PDFs using".

    If your PDF file is causing problems, this could be due to one of these issues:

    1) There's no extractable text. You could check whether a proper text can be extracted with this online tool.

    2) The PDF version is incompatible. If your file is PDF version 1.6 or higher, it could be that Zotfile has some issues with it. You can check the PDF version in the document properties. In Acrobat, they should be accessible with Cmd+D.

    See also the suggestions in this discussion.
  • There are reports of issues with PDF highlights done in Preview
    @qqbb: Let's not overstate this. The vast majority of Mac users likely use Preview without problems, and we have no reason to think there's a general problem with ZotFile extracting highlights made in Preview. (The thread you linked to isn't about this same issue, nor is the other one you posted this to.)

    If someone wants to share an example PDF where highlights aren't all extracted to support@zotero.org with a link to this thread, we can take a look. ZotFile is using an ancient version of pdf.js, so I'd guess that updating pdf.js in ZotFile would help.
  • Thanks for all you responses. Yes, exactly @dstillman, Preview has worked perfectly fine 95% of the time in the past. This is a new issue, at the moment happening for I would guess about 30-50% of the PDFs I've read in the last weeks.

    I'll send a PDF I've had lots of trouble with today via email. Thanks for offering to take a look! A functioning ZotFile would save a looot of time, so that would be wonderful.
  • I agree, there shouldn't be a general problem with Preview when using Zotfile's annotation extraction. There could be occasional issues, judging from reports on the forum (here) and on the web (here and here), but I can't tell whether any of this might (still) be relevant. (I'm not a Mac user.)

    When troubleshooting issues with poor highlight extraction by Zotfile, it could help to exclude potential issues related to the annotation tool. With Acrobat, PDF Expert (Mac), or PDF-XChange Editor (Windows), one can be fairly confident that the annotation tool is not causing issues.

    BTW, Skim doesn't seem ideal for using with Zotfile, see this discussion.
Sign In or Register to comment.