Zotfile: Selectively extract types of notations? pdf.js?
I would like to have Zotfile extract only notes I have taken, but not highlights. I don't see any preference for what kinds of annotations get extracted, but might it be possible to modify the pdf.js to make it blind to the types of stuff I don't want extracted?
For my purposes, making this as a permenent change would not be a problem. Except that I don't even know where the pdf.js is located, so I can't even begin to look through it and see if there is anything obvious that I (a non-programmer) can monkey with.
Any suggestions appreciated.
For my purposes, making this as a permenent change would not be a problem. Except that I don't even know where the pdf.js is located, so I can't even begin to look through it and see if there is anything obvious that I (a non-programmer) can monkey with.
Any suggestions appreciated.
That would be in extract.js, which is significantly less complex:
https://github.com/jlegewie/zotfile/blob/master/chrome/content/zotfile/pdfextract/extract.js
But you'd still need to modify it and then rebuild ZotFile (there are instructions on how to build ZotFile from source on that repository). This isn't super challenging as coding goes, but if you've never done anything like it, I'd imagine it's rather daunting.
Here is what I did:
1. Download the zip for Zotfile from GitHub
https://github.com/jlegewie/zotfile
2. Decompress and navigate to ...../zotfile-master/chrome/content/zotfile/pdfextract/pdfjs/src/
3. Open getPDFAnnotations.js
4. Go to line 23, which says: var SUPPORTED_ANNOTS = ['Text','Highlight','Underline'],
5. I deleted the 'Highlight' entry, because that is what will work best for me. (This way my text notes will be extracted, and any actual passages from the PDF that I want in the annotations, I can have pulled out by using underline instead of highlight.)
6. Build the xpi file using the instructions on the GitHub page. Since I am on Linux, I used the make file at the top of the Zotfile directory.
I guess the next step would be to try and figure out how to make an option in the Zotfile preferences to control what is included in the variable SUPPORTED_ANNOTS.
Thanks for the help!
So far it seems that commented highlights (annotations/notes adhering to the highlighted text) is separated when syncing the file back to the Zotero library.
I wonder if one could put those together again. Is the extraction process involved in any of this? If so, is there a documentation on how to use the manual (PDF Reference Manual 1.7) so that getPDFAnnotations can be modified for those ends, and then build the xpi file, or is it just for extracting the annotations?
So far the manual gives indications, that the original Markup annotations (by liquidtext) must have gotten confused by the extractor (markup transformed into text annotations), or these are not supported. Any ideas how to keep these markup annotations from liquidtext to pc pdf readers?