Parsing problem on Italian names

124»
  • edited September 21, 2015
    Unfortunately for Zotero itself we'll need a solution that doesn't corrupt the fields, as the double-quotes do.
    Would removing the "fixed-name"/double-quote option from the right-click particle menu make it acceptable as an interim solution? Users might be confused why certain options are absent (e.g. when dealing with "de Gaulle"), but it would still be helpful for most cases, including all Dutch names.
  • Whatever this means for the GUI, we cannot get around the fact that we need to be able to protect leading lowercase fixed parts of a family name, aka non-particles. Enclosing these in double quotes has been an adequate solution so far, and I feel strongly that it must not be removed (at the very least not the option of entering them manually) before a more permanent and aesthetically pleasing solution has been rolled out. Of course using non-breaking spaces would work just as well (I proposed this first a while back, I believe), but in the current situation, having access to at least one of these two forms of protection is a must.
  • I think RIS and BibTeX are on their own — if they don't have a mechanism for dealing with this, there's not much point in worrying about it.
    The answers at http://tex.stackexchange.com/questions/204697/how-to-correctly-typeset-an-authors-two-word-last-name-in-bibtex suggest that curly brackets can be used in BibTex, similar to how double-quotes are recognized by citeproc-js.

    And as a general comment, while I understand Dan's unwillingness to encourage data entry hacks, as a Zotero user I cannot feel but neglected, since for many years now Zotero has done little to make it easier to achieve correct name and title formatting (see also https://forums.zotero.org/discussion/51980/bold-italic-etc/?Focus=233795#Comment_233795), while it seems that both should be core competencies of a referencing tool. I probably bump into these issues on an above-average level since I deal with a lot of Dutch authors and paper titles filled with italics and the like, but these limitations must be affecting many users, and some flexibility on allowing stopgap solutions would be greatly appreciated. Maybe a switch to HTML fields is just around the corner, but the multi-year wait for a database field revision doesn't make me hopeful we will see a "perfect" solution any day soon.
  • @Rintze: both points, spot on.
  • edited September 22, 2015
    I'm naturally disappointed that the proposal has fallen at the first hurdle, and I'm not in a very strong position to argue the case for it, since I'm obviously invested in the coding. But there are a couple of reasons why I do think it might be worth another look.

    First, the dynamic menu does not introduce any new data entry conventions; it only helps users to apply the conventions that we already have with considerably less effort. (The double-quote escape hack has been with us for quite awhile, and as nickbart notes, it is currently the only means we have of forcing a sort on leading lowercase elements of a surname. For better or for worse, the workaround needs to be retained until something more elegant emerges, since sorting errors in a bibliography can be a factor in manuscript rejection.)

    Second, the menu provides a contextual illustration of the particle categories and their significance, which should be helpful on the forums. Frustration over particle handling has been partly my fault, for past bugs and inconsistencies in the CSL processor; but even in the best support scenario, users often need an assist to grasp the structure behind the freehand conventions (i.e. what are particles, what are their categories, how are they entered, what are their effects). Quite a bit of effort has gone into our exploration of the "particle space," as shown by the threads linked below. The dynamic menu is a partial distillation of what we have learned, and I think it would make support on the forums significantly less burdensome.

    Anyway, that's the why of it.
    1. “first-discussed” thread (4 Jul 20086 Jul 2008) 5 posts
      ※ original proposal that non-dropping particles be placed in a separate field
    2. “di Estos” thread (20 May 201016 Jul 2015) 31 posts
      ※ first recommendation of quoted input, implemented 3 May 2010
    3. “multipart last name” thread (22 Feb 201122 Feb 2011) 13 posts
    4. “van der Aalst” thread (23 Mar 201123 Mar 2011) 3 posts
    5. “Author Names” thread (29 May 201130 May 2011) 6 posts
    6. “De” thread (12 Dec 201114 Nov 2012) 27 posts
    7. “Rafael La Porta” thread (14 Jun 201226 Aug 2012) 5 posts
    8. “Eric von Hippel” thread (13 Aug 201214 Aug 2012) 16 posts
    9. “Einstein” thread (13 Nov 201216 Nov 2012) 6 posts
    10. “von, van, de” thread (14 Feb 20133 Aug 2015) 62 posts
    11. “Del Maestro” (19 Feb 201329 Mar 2013) 10 posts
    12. “BBC” thread (8 Jul 201314 Jul 2013) 12 posts
    13. “R.S. De Groot” thread (16 Jul 201316 Jul 2013) 2 posts
    14. “abu-” thread (22 Jul 201319 Aug 2015) 85 posts
    15. “de Villepin” thread (6 Nov 201313 Nov 2013) 3 posts
    16. “Juan de la Chica Caicedo” thread (16 Mar 201415 Aug 2014) 9 posts
    17. “Ab Halim” thread (18 Mar 201420 Mar 2014) 7 posts
    18. “van den Heuvel” thread (22 Apr 201422 Apr 2014) 13 posts
    19. “N.C.F. Van Sas” thread (5 Sep 201411 Sep 2014) 13 posts
    20. “Van Welie” thread (17 Nov 201418 Nov 2014) 4 posts
    21. “Claudio De Felice” thread (3 Dec 201411 Aug 2015) 12 posts
    22. “Adolph von Harnack” thread (15 Dec 201415 Dec 2014) 3 posts
    23. “French dropping particle” thread (28 Jan 201530 Jan 2015) 14 posts
    24. “María Isabel del Val Valdivieso” thread (25 Feb 201525 Feb 2015) 5 posts
    25. “Names reform” thread (28 Feb 201511 May 2015) 41 posts
    26. The “Dutch” thread (31 Aug 201521 Sep 2015) 32 posts
    27. The “Italian” thread (6 Sep 201522 Sep 2015) 95+ posts
    (EDIT: listing sorted and styled for clarity)
  • Really? How frustrating!

    To quote mark here:
    all too often in Zotero development, essential and not so difficult-to-implement features are held back in anticipation of the Grand Final Feature set that will solve this and a thousand other things.
    Here we have a rather difficult-to-implement feature, very long and *time-consuming* discussions, and finally an incredible work from Frank…

    Please don't do a remake of "counters/hints to notes, tags, and related tabs".
    Would removing the "fixed-name"/double-quote option from the right-click particle menu make it acceptable as an interim solution? Users might be confused why certain options are absent (e.g. when dealing with "de Gaulle"), but it would still be helpful for most cases, including all Dutch names.
    At least… yes!

    From a GUI point of view, double quotes can be replaced by non-breaking spaces in the future, but that has the inconvenient of being invisible when checking an important amount of data. An indicator/tooltip like the one which exists for the date field could be helpful.
  • To be clear, I'm not rejecting this feature, and it seems to me that much of the important work here has been to define the requirements and the handling in citeproc-js. That doesn't mean we're necessarily going to accept the first GUI implementation in Zotero (though I'm grateful to Frank for the effort).

    Our concerns are broader than just the bibliographic issue here, and include how data is displayed, processed, and analyzed in many different contexts. If the plain-text version is implemented in citeproc-js, it will work after Zotero is updated to that version, but I don't want to add a GUI feature that adds hacks to visible data or encourage the use of such a format, because it will make various other functionality across the Zotero ecosystem not work properly. One example: the middle pane of Zotero itself. Frank's screencast shows the quotes appearing there, which we certainly wouldn't want. So this informal markup format would need to be parsed for display, sort, search, and processing, there and everywhere else that Zotero data is handled.

    But we already have a markup format — HTML — that we deal with in notes, and people use it in titles because of the citeproc-js support, so simply broadening the expectation that Zotero fields are HTML is far more acceptable and seems like the obvious solution here. That means the wait will be a little longer, but 5.0 is nearing completion. And if we can do this all via HTML and don't need additional fields, this can happen in 5.0 proper, without waiting for data model changes.
  • edited September 23, 2015
    If the plain-text version is implemented in citeproc-js, it will work after Zotero is updated to that version
    The double-quote syntax was implemented in citeproc-js over five years ago. It isn't something new.

    (Edit: It's also worth mentioning that the basic processor input format for personal names hasn't changed significantly since the release of CSL 1.0 in May of 2010. The aim here has been to get Zotero and the processor working more smoothly together, so that CSL can do its thing.)
  • edited September 22, 2015
    if we can do this all via HTML and don't need additional fields, this can happen in 5.0 proper
    I don't agree with your reasoning, even if 5.0 is just around the corner. I understand why you might dislike unparsed markup in Zotero UI fields, but what has been the alternative for Zotero users for the past decade, when they e.g. needed rich text formatting of titles? Do you really think users should just have been manually fixing up their bibliographies all this time?

    Especially if you think HTML markup is the way to go, why are UI features for adding that exact same markup not acceptable as an interim solution until 5.0 is released? Can't you empower your users, and give them the choice of whether any improved functionality is worth the ungainly sight of some unparsed markup tags?

    In my field of study, rich text markup in titles is everywhere, and I've always been pushing for better support, first (and successfully) in CSL, and later in Zotero. I've created and shared workarounds from before citeproc-js made it into Zotero (https://forums.zotero.org/discussion/3875/rich-text-in-titles/#Item_11), and have been very thankful for citeproc-js' ability to parse title markup in more recent years. It's disappointing to me that after all these years, Zotero still doesn't have shortcuts to quickly add this markup.

    With regard to Frank's particle menu, I feel the same way. Can't you accept his PR if we replace the double-quote markup with a HTML span, that later can be parsed by 5.0?

    Dan, I love Zotero and I greatly admire your work, and we all know Zotero has a small team and limited resources, but it's saddening and disappointing to me to see that too often Frank and others put in a lot of work into a new feature, only to see the resultant PR gather dust (the "counter" PR referenced by Gracile above is a prime example). I really think Zotero would be a better project if its user community was better equipped to influence its development.
  • First of all, 4.0 is more or less frozen. All development work is focused on getting 5.0 out the door, and, due to extension signing and time, we're not planning to even put out another 4.0 release if we can help it. (If something critical comes up, we will, but something that's been an issue for 9 years doesn't qualify.)

    To the issue at hand: when people add manual markup, they understand that it's a hack. That's not the case with a GUI menu. Adding a button that caused markup to show up in the creator field or in the middle pane would, in my opinion, be embarrassing — an obvious unfinished hack — and I don't see any reason to do it when a proper solution is right around the corner.

    But you still seem to be responding as if I'm rejecting this request. At this point I'm really just asking for exactly what you say — a version of Frank's PR that generates HTML (plus some minor UI changes, which we can discuss later) — but against master. Since 5.0 will be in beta, releasing it with visible markup would be somewhat more acceptable, and we can add in HTML parsing, editing, and rendering separately. I'm happy to help move that forward. It just can't happen on 4.0.

    (Re: counters and other pull requests: we accept pull requests all the time, of course. But some do get neglected, and I apologize for that. Bumping doesn't always work, as the counters PR demonstrates, but I'm happy for people to do that when they think something has slipped through the cracks. As you know, we're also currently hiring additional developers and our first product designer, so we'll have more people to help tend the PR queue and also build polished UIs to accompany them. But that's not the issue here.)
  • It finally occurred to me given the nature of the conversation:

    PR = pull request (the preferred method of submitting contributions to an open development project)

    It took me quite some time to figure this out. I hope this helps others particularly non participants understand and realize that this thread isn't as hostile as it might seem at first glance when viewed out of context.
    First of all, 4.0 is more or less frozen. All development work is focused on getting 5.0 out the door...
  • edited September 23, 2015
    Dan: I'll leave the branch in GitHub, feel free to pick bits from it as and when. I won't be touching it further.
  • OK, well, the next step here, to help whoever ends up reworking Frank's PR, would be to figure out the desired HTML markup.

    One thing I'm not clear on: with appropriate HTML markup, a GUI menu, and flexibility of presentation, would the display of the particles in the given vs. family name field still be ideal? Or is that just a workaround given the plain text?
  • edited September 23, 2015
    I understand the situation around 4.0 and 5.0. I'm just railing against your opinion that:
    when people add manual markup, they understand that it's a hack. That's not the case with a GUI menu. Adding a button that caused markup to show up in the creator field or in the middle pane would, in my opinion, be embarrassing — an obvious unfinished hack
    I'm just frustrated that this has effectively condemned users like me, who heavily depend on rich text markup in titles, to typing out tags by hand for the past few years, or alternatively to correcting the formatting in post-production. Either option is inconvenient. Adding shortcuts (which aren't very discoverable anyway) has always seemed like a perfectly fine compromise to me. For users like me, for whom rich text titles are a required feature, Zotero has always felt unfinished in this respect anyway.
    At this point I'm really just asking for exactly what you say — a version of Frank's PR that generates HTML (plus some minor UI changes, which we can discuss later) — but against master. Since 5.0 will be in beta, releasing it with visible markup would be somewhat more acceptable, and we can add in HTML parsing, editing, and rendering separately.
    Can you suggest any HTML markup to replace the double-quotes? Maybe, based on the CSL terminology (http://docs.citationstyles.org/en/stable/specification.html#name), we could use

    <span class="family-name">de Gaulle</span>

    For rich text titles, would you be willing to accept a PR with just shortcuts (against master), as I offered at https://forums.zotero.org/discussion/51980/bold-italic-etc/?Focus=233795#Comment_233795 ?

    (Dan, also, in general, is it still worthwhile to work on UI features implemented in XUL? If you have any thoughts about how Zotero will be dealing with XUL's imminent demise, I appreciate hearing them)
  • edited September 23, 2015
    One thing I'm not clear on: with appropriate HTML markup, a GUI menu, and flexibility of presentation, would the display of the particles in the given vs. family name field still be ideal? Or is that just a workaround given the plain text?
    I feel we should create a solution that doesn't require every user who wishes to adjust particle assignments to read the treatise on name particles in the CSL specification and understand the lingo, which I fear will be the case if we break out dropping and non-dropping particles in the UI. Two-field name entry seems like the default, even in particle-crazy countries like the Netherlands, so I would stick with that and just add the menu.

    edit: that said, I always find editing two-field names in Zotero annoying because with a narrow right-hand column, editing one of the fields hides the other.
  • I'm just frustrated that this has effectively condemned users like me, who heavily depend on rich text markup in titles, to typing out tags by hand for the past few years, or alternatively to correcting the formatting in post-production.
    Has it? To be clear, I was really just talking here about the menu and markup for this, which would either not be HTML or be more complicated HTML. I feel much less strongly about adding unrendered rich-text markup (at least those with basic HTML tags) via the keyboard. Not ideal, particularly in the middle pane, but at least it's well understood by many people and would be limited to entry via the keyboard. And other than a week ago when I said that a patch would need to go to 5.0, like pretty much all patches now, I'm not sure I actually rejected that (though it's certainly possible).

    Anyway, for the current issue:
    Can you suggest any HTML markup to replace the double-quotes?
    I was assuming that there'd be a span around the particle with a class indicating its type, at least for some of the modes, but maybe that doesn't make sense. Someone more familiar with the issues here will have to take the lead on that.
    I feel we should create a solution that doesn't require every user who wishes to adjust particle assignments to read the treatise on name particles in the CSL specification and understand the lingo, which I fear will be the case if we break out dropping and non-dropping particles in the UI.
    I certainly agree with the first part. Not sure I follow the second. What I'm asking is whether it would still be necessary for the menu actions to move the particle between the given and family name fields, as it does in Frank's PR, or if we could get rid of that and just add appropriate HTML classes around the particle.
    Dan, also, in general, is it still worthwhile to work on UI features implemented in XUL? If you have any thoughts about how Zotero will be dealing with XUL's imminent demise, I appreciate hearing them
    Let's discuss that on the dev list. Short answer is that we obviously want to minimize XUL work, we can make minor changes if need be, and we should start laying the groundwork for HTML everywhere.
  • I was assuming that there'd be a span around the particle with a class indicating its type, at least for some of the modes, but maybe that doesn't make sense.
    Well, in the case of "de Gaulle", "de" is not a particle at all (in CSL parlance at least), so it doesn't seem to make sense to separate it from "Gaulle".
    I certainly agree with the first part. Not sure I follow the second. What I'm asking is whether it would still be necessary for the menu actions to move the particle between the given and family name fields, as it does in Frank's PR, or if we could get rid of that and just add appropriate HTML classes around the particle.
    I thought the alternative you were thinking of was presenting the user with explicit information about which particles are considered dropping and non-dropping. I don't think most users should ever hear these terms.

    The benefit of the current division of particles across the given and family name fields is that at least it gives a cue to the user that not all particles are the same (although discoverability is currently an issue, since it's not obvious where particles should be entered, and how this affects name processing).
Sign In or Register to comment.