Unable to use OCR function.

I'm a new Zotero user, interested in being better able to read and mark up pdf documents. I have tried hard to set up OCR functions, on the Zotero 7.0.13 version I've downloaded, but without success. It seems most of the help items I've found are for older releases, making them difficult to follow. Also, there are multiple steps needed, and I'm not finding ways to verify each step.
I'd love to get some coaching help. Or pointers to resources I should be finding.
Tom Goldsmith TTGsmith@TGandA.com
  • There are not so much to do, and everything is explained in the homepage of the plugin: https://github.com/UB-Mannheim/zotero-ocr
  • Thanks for this suggestion.
    The instruction (at https://github.com/UB-Mannheim/zotero-ocr) looks like it covers the version 7 I'm working with. But I'm still in the fog.
    I'm not sure what a .xpi is or does, but the one I have (and point to) for pdftoppm is finding something that is apparently still un-zipped. (I have a faint recollection of downloading something that didn't un-zip.) Also, I see other instructions that look to me like un-familiar territory. This suggests I could be spending a LOT of time guessing, and learning what to do.
    .
    I would be very pleased to find a source that could walk me through this process. And even answer some other questions I have about using Zotero.
    ....Do you have ideas of where I might look?
    Tom Goldsmith
  • I'm a member of the zotero-ocr team, I'll be happy to help.

    Can you follow the instructions as exactly as possible, in sequence, and report the exact point where things become unclear for you?
  • I'm not aware of anything more detailed for Zotero OCR than the instructions they provide.
    If you have specific questions, just asking here is fine.

    A .xpi file is the format for all Zotero add-ons. You just download it and then install it into Zotero from the Tools menu. That's for the Zotero OCR add-on. While .xpi's are technically a form of .zip files, you should never unzip them as part of normal installation and usage.

    pdftoppm is _not_ an xpi format. If you're on Windows, it actually does come as a .zip and you do need to install it as described on the separate page:
    https://github.com/UB-Mannheim/zotero-ocr/wiki/Install-pdftoppm
  • edited 7 days ago
    The Zotero OCR documentation is intended to be sufficient. If it isn't (which is always possible), feedback is of course welcome.
  • Poettli, Aborel, AdamSmith;
    Thanks for these suggestions, but I'm still in trouble.
    Nina, my long-time friend and supporter. has also been working to help me, and we've run into a different bug. ….She has installed Zotero 7, but when she tries to add the Zotero OCR, using the .xpi, she gets the following message:
    “Adobe Acrobat Reader could not open Zotero OCR 0.8.1.xpi.
    It is either not a supported file type or because the file has been damaged.”
    We suspect this message has something to with her "free" Adobe Acrobat. But I also note that I too have that free Adobe Acrobat, and have NOT seen such a message.
    .
    I hope you can help us with Nina's issue. Then too, I can later tackle the problems I'm still having with my installation.
    Tom Goldsmith Mon 17Mar2025
  • She's trying to open the .xpi file with Acrobat.
    Again, you shouldn't open that file with any software. You should download it and then select it from the Tools --> Plugins menu in Zotero (that's how all Zotero plugins work; this isn't specific to Zotero OCR)
  • Poettli, Aborel, AdamSmith;
    I see what you mean about Acrobat trying to open the .xpi !!
    ….Thank you for being polite about that. : )
    .
    I’ve gotten further with the installation, and noted down a number of points I’ve gotten caught on. So would be glad to go through them with you. But I’m not sure my installation has been successful.
    When I’ve tried to use the OCR function the Help>>Debug Output Logging>>View Output shows a lot of activity (which I’m not prepared to interpret). And since I don’t find time-stamps in it, I wonder what might be current.
    ….Can you give me ideas of what may be happening?
    .
    Again, I’m still looking for instructions, as I get started. While I suspect some sort of coaching might be efficient for me.
    Tom Goldsmith Tue 18Mar2025
  • Debug output is used by devs only. Normal users don't need to pay attention to what is written there. You should see a .ocr file under the parent item if the process was successful. Is it the case?
    https://s3.amazonaws.com/zotero.org/images/forums/u2119014/rwbzjioaip0u8v85esph.png
  • edited 3 days ago
    When I look in the sub-collection where my .pdf shows, I don't see a .ocr file. Is this where I should be looking?
    When I was exploring the Debug output, I thought I saw indications of text recognigtion, but when I looked again for that log, it wasn't there.
  • edited 3 days ago
    I see our documentation might be missing a few sentences for users before it jumps into development information... thanks for the report, such feedback is always useful.

    In order to actually perform the recognition using the plugin, you need to right-click on a PDF attachment in Zotero and select "OCR selected PDF(s)" in the contextual menu. If the settings haven't been changed after the installation, the .ocr copy (essentially the same PDF as the original, with an underlying text layer that you can search, annotate, etc.) should appear in Zotero after a while (one second to a couple of minutes depending on the length of the document).
    The plugin doesn't show its progress while running (except by adding some page-1, page-2... attachments depending on the settings, as in poettli's screenshot), we are aware that it would be helpful for many users and we are considering ways to improve that.

    I hope this helps, don't hesitate to ask again if you need further guidance. If so, it would be great if you can explain exactly what you have done, step by step?
  • Poettli, Aborel, AdamSmith;
    I know documentation is an “evolving” product, and a deep-consuming effort. Where there there’s no good substitute for step-by-step “re-test” with a new user. ....So I’m glad if I can help in that effort.
    .
    I don’t believe I’ve changed the OCR settings as I’ve installed.
    But so far, I haven’t yet seen the “note on regression screen, with the “… page-1, page-2, ...” list poetti cited yesterday. (Nor do I grasp how “two parents” might fit in).
    But I did find, in the Zotero far-right panel, a “3 attachments” piece.
    With my install, per this clip:
    >>>>>Ooops I apparently can't show "clipped" picture in this forum.
    Can I e-mail the screen clip to you??
    .
    The top part (of my clilp) shows the second page of the .pdf I’m working with. Then the there are three below lines that connect to .pdf's. The first connects to the raw .pdf version, and the second and third lines connect to OCR’d versions I’ve been seeking. (Two lines, because I must have tried twice to do the OCR process. (Not having realized Zotero was already busy at work.)
    But is this where I should be looking? I’ve expected either to find, in the larger Zotero center panel, either my beginning .pdf but with OCR detail now present, or a second line, with the same file name, but supplemented with and OCR-identifying suffix.
    Perhaps you can clarify this for me, or let me know where I can find version=7 documentation, with details I’ve missed?
    .
    The good news (!!!) is that I now have gotten Zotero OCR to function for me, and thus have an OCR-detailed .pdf to work with. So thank you VERY much for your helping me get this far.
    Tom Goldsmith Wed 19Mar2025
  • You can share your screenshot on a platform such as https://postimages.org/ and post the link here.
  • Aborel;
    Here's the PostImage link, to my screen shot.
    https://postimg.cc/pmmgFzxV
    Tom Goldsmith
  • Can you post a screenshot that shows the item and its attachments in the central pane? I'd like to see the full name of the attachments, here the ending is unfortunately not visible.
  • Aborel;
    Here's the PostImage link, for the center screen shot, with the full file name.
    https://postimg.cc/pmTB6kxN
    Tom Goldsmith
  • We're getting closer... please click on the arrow on the right-hand side of the item so that we can actually see the attachments.
  • edited yesterday at 2:27pm
    Aborel;
    Clicking the arrow at the LEFT of the item name in the center panel gave me expanded information. : )
    See: https://postimg.cc/1gLrCYpW
    ....I don't find anything about "two parents" or pages.
    So I remain curious on that aspect.
    Tom Goldsmith
  • Great! Just a quick note because I don't have much time right now: the plugin has worked!
    You have the .ocr copy of the original PDF (two copies, so you can delete one), as well as a note that also contains the recognized text (also two identical copies).
Sign In or Register to comment.