[Z7 Beta] Remove ¬ in annotations
Hi all,
some OCRed texts use the ¬-symbol to delineate line breaks. So a text looking like this:
This is sub-
ordination
becomes "This is sub¬ ordination" in a Zotero annotation or note. It would be great if Zotero could automatically fix this by removing the ¬ and the following space, so that the annotation text would read "This is subordination".
some OCRed texts use the ¬-symbol to delineate line breaks. So a text looking like this:
This is sub-
ordination
becomes "This is sub¬ ordination" in a Zotero annotation or note. It would be great if Zotero could automatically fix this by removing the ¬ and the following space, so that the annotation text would read "This is subordination".
This means that for example all scanned PDFs on pedocs.de, a central database for educational scientific literature, contain the special character. Re-OCRing them all would be a bit of a hassle.
Many times a database, say, PubMed, or the metadata from a publisher's website will not contain an abstract but there is an article summary on the publisher's webpage. I copy this text (rendered as html or from a pdf) and paste it into my text editor.
I have my text editor set to show selected non-printing characters such as the ¬ to indicate a line break. Sometimes, although there isn't the ¬ character on the screen image of the PDF, when the text is pasted into my text editor there are two ¬ characters (one printing and one non-printing) at the end of each line. There aren't double-spaced lines on the PDF document.
If I copy the PDF text from the screen (of the publisher's site) and paste it directly into Zotero there are no ¬ characters but there are unwanted line breaks where the text breaks on the PDF.
Over the years, I 'trained' my text editor to handle removing line breaks and the occasional extraneous printing ¬ characters. I've mostly 'taught' it to handle hyphenated words across line breaks. (But I've not been able to automate wanted hyphens like in the next sentence when the phrase crosses a line.) It is labor-intensive and isn't perfect but one-by-one / document-by-document I can get a suitable abstract in Zotero. If Zotero or a plug-in could (mostly) automate what I have been doing by hand it would be great. I long accepted that I need to copy to a text-editor and do some manipulation if I want a pretty abstract. If regex masters within the Zotero community can help, I would be very happy. I can handle the regular expressions fundamentals but if-then with any accuracy is bewildering.