Add Note From Annotations Page Numbers Incorrect
In Zotero 6.0.4 on the Mac, when I highlight a PDF and use "Add Note From Annotations" the page numbers of the annotation appear to start from 1 instead of from the actual page numbers used in the article. Given that many articles' page numbers don't start with 1 but instead are based on where they are located within the particular journal issue, these page numbers are misleading/wrong. The page numbers should be based on what they would be in a correct citation.
But from this article, the extracted annotations had the wrong page numbers (started with page 1 instead of with page 1063: Stubbs, W. (2019). Strategies, practices, and tensions in managing business model innovation for sustainability: The case of an Australian BCorp. Corporate Social Responsibility and Environmental Management, 26(5), 1063–1072. https://doi.org/10.1002/csr.1786
Both articles were annotated using the built-in Zotero PDF reader.
I can't access the second PDF at the moment, but if it has visible page number we can see if we're able to better detect them.
"Add Note from Annotations" doesn't use the Pages field at all, if that's what you're referring to. We could consider using that, though figuring out how the interaction between that and automatic page number detection or a manual page number change might be a little tricky.
I agree with @kevintaylor: "The page numbers should be based on what they would be in a correct citation." It can be done.
Additionally, I would like to know how to unlock annotations in the side bar of the pdf after having exported the annotations.
If you have a PDF where the page numbers weren't detected correctly, email it to support@zotero.org with a link to this thread.
I tested annotations in five (5) pdfs, which included the pdf I sent you. All annotations were made in Acrobat Pro DC (22.1.20085.0); MacOS 12.3.1; Zotero 6.0.4. Annotations were extracted using the Zotero “Add Note from Annotations,” and also the Zotfile (5.0.16) “Manage Attachments > Extract Annotations.”
For the most part, correct page numbers or the undesirable logical page numbers were displayed depending on whether or not the article's correct page range is included in the Pages field.
TEST RESULTS
For annotations in three (3) of the five pdfs, the results were:
1. Include article's page range in Pages field:
- select "Add Note from Annotations," got article's correct page numbers
- select "Manage Attachments > Extract Annotations," got article's correct page numbers
2. Pages field left blank
- select "Add Note from Annotations," got article's correct page numbers
- select "Manage Attachments > Extract Annotations," got logical page numbers
This suggests it might be best to select "Add Note from Annotations" to get the article's correct page numbers, whether the Pages field is filled or left blank. However, for two (2) of the five pdfs, the same procedure produced different results suggesting "Add Note from Annotations" is not the solution, and neither is "Extract Annotations":
1. Include article's page range in Pages field:
- select "Add Note from Annotations," got different results for each pdf:
- got logical page numbers
- got a mix of correct and logical page numbers (the pdf I sent you)
- select "Manage Attachments > Extract Annotations," got same results for each:
- got correct page numbers
2. Pages field left blank:
- select "Add Note from Annotations," got different results for each pdf:
- got logical page numbers
- got a mix of correct and logical numbers (the pdf I sent you)
- select "Manage Attachments > Extract Annotations," got the same results for
each pdf:
- got logical page numbers
The article’s correct page numbers should always be cited (shown) in the annotation links, not the logical page numbers.
I'll get the pdfs I tested, and send to support. Do you agree the way to reproduce the problems is to do what I did. For each pdf: 1. include correct page numbers in Pages field; "Add Note from Annotations," see results; "Extract Annotations," see results; then 2. leave Pages field blank; "Add Note" and "Extract," see results?
If you were using the Zotero beta previously, old annotations previously detected by Zotero aren't relevant, since Zotero's page detection has improved. I don't believe it has changed since the Zotero 6.0 release.
The Pages field isn't relevant — Zotero's PDF page detection doesn't use that.
Extract Annotations also isn't relevant — we're just debugging Zotero's PDF page detection here, and Extract Annotations was removed in ZotFile. (I assume you're running a modified version of ZotFile, or testing with a Zotero 5 install, since 5.0.16 wasn't marked as compatible with Zotero 6.) ZotFile (only?) used the Pages field, so its results aren't particularly interesting. As I say above, we could consider incorporating the Pages field into Zotero's page detection as a fallback, though we'd have to decide how to resolve conflicts.
The Pages field is having an effect on the output of the annotation page numbers.
Thanks for helping,
ZotFile was using that field — and possibly not doing anything else, which is why your ZotFile examples just show an offset from Pages when it was set and numbers starting from when 1 when it was blank. And that's why ZotFile's results really aren't interesting. Zotero tries to detect the page number even when Pages isn't set, so you get correct page numbers regardless of the metadata.
We'll look at the PDFs you sent — thanks.
The 1991 one is correct for me, as it was for Martynas. I'm not sure what the history of that item is in your library, but I think you'll find that if you just drag a copy of the PDF to Zotero again as a new attachment, the pages will be detected properly. There's no reason you should get a different result.
So Zotero just isn't properly detecting the page numbers in the 1990 PDF. We'll try to fix that.
Again, none of this has anything to do with the Pages field, which Zotero doesn't use.
Also, thank you for working on a fix for the problem presented with the 1990 pdf. It feels like this issue with instances of annotation links showing the pdf's logical pages instead of its correct pages is on the way to resolution.
Some PDFs have the correct page number, but most of them starts at 1.
I have edit the page number manually in the sidebar, but that dosen't seem to help it width the PDFs that don't work.
Normally, if you change annotation page number, then new annotations get the relative page number.
Both the recognition and the page number should be used. Conflicts should be resolved as follows: If the recognition does not assign "1" to the first PDF page it should be used; if recognition assigns "1" to the first PDF page and so does the page number, recognition should be used; BUT if recognition assigns "1" to the first PDF page but the page number *does not* assign "1", then the page number should be used.
This would ensure that PDFs without accurate page numbers default to recognition, but PDFs with accurate page numbers do not. The exception case will be PDFs that do start at "1" but which recognition for some reason mistakes, which is a reasonable trade off for getting every case where recognition is wrong but the page numbers are right correct.
This is the main functionality of zotero for me, so I would very much love an update that fixes the issue. Unreliable page numbers make the annotation extraction difficult to use for its developed purpose.