OCRing a PDF after annotating in Zotero
I'm working with a lot of large PDFs made up of scanned handwritten documents. I am going to import these into Transkribus, run the OCR, then re-export the PDFs and import them into Zotero. However, some of these PDFs I have already added into my Zotero and annotated prior to OCRing. Would I be able to take the new, OCRed PDFs (which are identical to the old files except for having the text layer underneath) and simply replace the old PDF file (I have stored them in Zotero as links) and have the annotations be preserved? I guess the actual question is, are the annotations in the Zotero database keyed to any particular version of a PDF file?
(If I'm unable to do this, I will have to resort to the other alternative, which I've already done successfully with other documents but is very time-consuming: export the PDF from Zotero with PDF comments, extract the comments via Foxit PhantomPDF or similar, reimport the extracted comments (which are saved as a text file) into the new PDF also with PhantomPDF, and then reimport *that* new PDF back into Zotero and extract the comments as annotations.)
(If I'm unable to do this, I will have to resort to the other alternative, which I've already done successfully with other documents but is very time-consuming: export the PDF from Zotero with PDF comments, extract the comments via Foxit PhantomPDF or similar, reimport the extracted comments (which are saved as a text file) into the new PDF also with PhantomPDF, and then reimport *that* new PDF back into Zotero and extract the comments as annotations.)
Upgrade Storage
I got a similar problem, and it usually goes like this:
1. In big scanned pdfs I sometimes put in hours of work and tons of annotations/highlights.
2. I reach a part in the pdf where the text is unreadable or unusable. So I cannot annotate/highlight or not use the search ctrl+F/cmd+F. Words are not displayed correctly etc.
3. I am doing an OCR for the pdf.
4. Then I got two pdfs. One where I put in hours of work, and the new one where the old part of the text does not have any annotations/highlights at all, since it is now a new pdf (the OCR one).
5. Now the Questions are popping up:
5.1 Is there any way I can merge them together?
5.2 Can I get annotations/highlights from the old one to the new one? (e.g. copying the exact pages of the old pdf in the new one)
5.3 Or is there now way? Cause that would be really annoying. I got two texts with different annotations/highlights.
This happens more often than one would think unfortunately.
PS to point 3: Sometimes it is not only a problem with scanned texts. I got a digital text right now, but all bold words are displayed with three letters each per usually single letter when I highlight them. So eg when I highlight the bold word *Zotero* it is displayed in the annotations/highlight text as "ZZZooottteeerrrooo". Really annoying. Not only in the highlights, but this is also makes ctrl+F/cmd+F completely unusable.
I added an example:
https://s3.amazonaws.com/zotero.org/images/forums/u16660026/5lan2omfjywzqo1x6ccq.png
Zotero stores annotations in its database, not in the file. As long as all elements in the file stay in the same place, you can just swap in a new file.