Multilingual Zotero: mixed RTL-LTR language input problem
I am trying to enter a title in Farsi that contains some Roman characters in parentheses. As soon as I enter the Roman characters the word order in the paragraph changes, so that words in the title get all mixed up and I cannot put those Roman characters where they should be.
Is there a solution for this problem? The whole reason I've started using Zotero was to have something that could handle a RTL-LTR bibliography...
Is there a solution for this problem? The whole reason I've started using Zotero was to have something that could handle a RTL-LTR bibliography...
Of course, the same happens if I just start typing Farsi in this window: برراسی تب کریمه-کنگو CCHF
The word "CCHF" really should have been "after" کریمه-کنگو (i.e. to the left of it).
Probably the way to handle this will be to apply directional markup to the field content in the output, for the RTL languages. We already have the basic facilities in place for doing that, from recent work on title-casing (which is restricted to English titles).
That's the idea in theory, but as I don't know any RTL languages, two things would be very helpful. First would be a screenshot of the bad entry as it appears in your document, with an explanatory note or manually created counter-example that shows what the text should look like. With that, I can identify when I've come up with a fix that works.
The other thing is sample data. If you can export your bad entry as Zotero RDF, paste the code to http://gist.github.com, and post the URL back here, I can import the entry locally for testing.
Thanks for reporting this -- feedback on things that run beyond my own use and knowledge really is invaluable.
git://gist.github.com/3190606.git
Arabic (ar)
Hebrew (he)
Farsi (fa)
Urdu (ur)
Yiddish (yi)
Pashto (ps)
Any of these might be cast in another script, but the tagging scheme used in MLZ can express the script together with the target language, so we should be able to catch these reliably without causing confusion. I'll brew up an initial fix, and look into the multilingual subfield issues a bit later. More again soon.
(Edit: If this works, it may be enough, actually. Translated or transliterated sub-fields will most often be in a uniform script, and should not be affected by mixed-text issues. The only case where problems arise will be where an original title is in English (say), and contains embedded acronyms and whatnot that are reproduced verbatim in a translation into an RTL language, for use in publications directed at an RTL-language audience. We can cross that bridge when we come to it.)
It looks like providing a means of inserting the RTL/LTR strong hint characters as you suggest is what's needed. Back in a bit ...
Alternative forms also adapt to their language, so an English translation of an Urdu field will be LTR.
The one unhappiness is that the centre panel listing is all set in the mode of the browser locale (I think), so RTL titles come out in LTR mode on my system here, and mixed text that is meant to be primarily RTL is reordered unpleasantly. I've tried various things, but controlling this at the cell level in an XUL tree (the engine that generates the listing) does not seem currently to be possible in Firefox or XULrunner (the platform that Standalone runs on).
Let me know how it looks. I do think we've gotten closer.
It should be smart enough that a supplementary field in "romanized Farsi" (fa-alalc97) or in one of the phonetic scripts or whatever will be handled as LTR.
There has been a parallel report about RTL parens in exported citations, which can come out half-reversed, and sometimes in the wrong location. This could be an issue particularly in multilingual citations that provide supplementary information. I've been thinking of how to fix it, and I think I understand: we want to tag field output runs for text direction, and separately force parens to the dominant direction of the surrounding text. This can definitely be done, but I'll wait until I have some test data from real-world use to start on the solution.
Thanks for your patience. It's nice to have this working.
Best regards,
Anna
Glad to hear the good news. I just issued a small update that attempts to solve the center-panel display problem for mixed-text titles. Any existing mixed-text entries you have will still display incorrectly following the update, but each will correct itself when the title field is opened for editing and resaved.
What the fix does is to wrap the title field in RTL-language entries in a right-to-left-embedding (RLE, or U+202B) character at the front and a pop-directional-formatting (PDF, or U+202C) character at the back. The small disadvantage of doing this is that the RTL behaviour is slightly "sticky" -- if you change the language of the item to an LTR language, the dominant directionality inside the title string will remain RTL until the field is opened and saved.
The other small downside is that this introduces some extraneous (but invisible) characters into the field that could mess up automagic behaviour that depends on sniffing the character set of the field. I can't think of cases where this would be an issue, though, for the title field.
Thanks for your thoughts on contributing. I'm working on a book about the project that should be ready for public release by late September. The plan for the moment is to see how it does for sales, and think about other channels for project support if that seems necessary.
The text will be available as a PDF free of charge under a Creative Commons license, but the printed version (and eventually an ebook) will come with a pretty cover, and sales will help support the project. Watch for it at an Amazon search listing near you!