Bug: Bibtex translator import omits carriage returns and newlines

kjr_nl · September 1, 2017

I have a bibtex bibliography of 500 items and I have imported it in Zotero 5.0.17. Problem is that the import removes returns in abstracts. I have looked into the bibtex translator source code and it seems that the bibtex import translator is programmed to skip carriage returns (\r) and newline (\n) characters. Could it be programmed such that the bibtex translator does not skip \r and \n in abstracts?

emilianoeheyns · September 8, 2017

While not feature-complete, BBT (currently being rewritten for Z5 and at beta level at this stage) will augment the import and will interpret newlines as LaTeX does (where sanely achievable), which means 2 or more is a paragraph break, and one is a space.

noksagt · September 8, 2017

I believe these are often used to just set line length to a readable limit & have no semantic meaning, so would not be preferred behavior much of the time.

When importing the abstract, perhaps we should look for two in a row to mark paragraph breaks though?

adamsmith · September 8, 2017

I suppose the 2 linebreaks rule makes sense -- but shouldn't line breaks in LaTeX (and thus bibtex) be \\ or \newline?

edit: ah, I had forgotten that LaTeX also respect blank lines as new paragraphs. We definitely should respect that, then.

emilianoeheyns · September 8, 2017

That's what BBT does now. One newline is a space, two is a paragraph break.

adamsmith · September 8, 2017

OK -- your translators are coffeescript too, right? Or do you have a code snipet I could just copy over?

emilianoeheyns · September 8, 2017

They are at this moment; after the port I'm moving everything to ES6. Anyhow, BBT has externalised the bibtex parser to https://github.com/fiduswriter/biblatex-csl-converter (most of the work having been done by Johannes) for my import parsing, and it takes care of this in the parse phase. It's not as simple as copying a piece of the code, unfortunately.

For the stock Zotero translators, wouldn't something like this do the trick?

 text.replace(/\n+/, function(match) { return match.length == 1 ? " " : "\n\n"; })

(or, if you want HTML)

 text.replace(/\n+/, function(match) { return match.length == 1 ? " " : "<p>"; })

(which is cheating because it's not valid HTML, but most HTML parsers will deal fairly well.

<br><br>

is less-cheaty)

adamsmith · September 8, 2017

Yeah, I just need to think through a couple of scenarios to get this right (the abstract field doesn't take HTML, so the former will work for abstracts; need to see about notes, which are HTML).

kjr_nl · September 20, 2017

I have tested it this morning and all 500 items were imported correctly. So I think that the implementation is fine the way it is now. Thank you all very much for your work!