loc author errors

when scraping the library of congress catalog, zotero usually mislabels all authors after the first as contributors...
this should be easy to fix...
cheers,
peter
  • mph
    edited October 9, 2006
    Actually, I doubt that would be easy to fix since the MARC format dictates that all but the first-listed author be an added entry--personal name (MARC 700 field), which does not distinguish between authors and other contributors, e.g., editors or contributors to a collected work (the author is usually given as the main entry in the 100 field). I don't know anything about the technical workings of Zotero, but I can't see how it could add precision that is not present in the MARC record. If there is a way, I'll be even more impressed with the people behind this than I am now. :)

    It's pretty easy to change the value of the contributor field to author once Zotero has imported the record.

    The other added entry fields don't seem to pose the same problem:

    111 2 $a IEEE Computer Society International Workshop on Cluster Computing $n (1st : $d1999 : $c Melbourne, Vic.)
    700 1 $a Buyya, Rajkumar, $d 1970-
    710 2 $a IEEE Computer Society.
    710 2 $a IEEE Computer Society. $b Task Force on Cluster Computing.
    710 2 $a Monash University.

    The 710 field (added entry--corporate name) is labeled "author," undoubtedly because there's no ambiguity there. So in this example, the 111 (main entry--meeting name) and three 710 fields are labeled "author" in Zotero, while the 700 is "contributor". I don't see any way around this, except by manual correction for each record.
  • We'll look into this, but mph is indeed right that we can't do a lot about ambiguous MARC record fields (or incorrect MARC info).
  • I've been thinking more about the treatment of authors in Zotero, and I wonder if it wouldn't be useful to add labels for corporate authors, to correspond to the MARC 110/111 and 710/711 fields. At first it didn't occur to me, because I was just thinking about citation formatting, but today I exported a MARC record as MODS.

    The original MARC:

    111 2 $a Exploratory Conference on the History of Nuclear Physics, $c Brookline, Mass.,$d 1967, 1969. $n 1st, 2d,

    ...

    710 2 $a American Academy of Arts and Sciences.
    710 2 $a American Institute of Physics.

    Zotero Info:

    Author: Physics, Exploratory Conference on the History of Nuclear

    ...

    Author: Sciences, American Academy of Arts and
    Author: Physics, American Institute of

    MODS:

    <name type="personal">
    <namePart type="family">Exploratory Conference on the History of Nuclear Physics</namePart>
    <namePart type="given"/>
    <role>
    <roleTerm type="code" authority="marcrelator">aut</roleTerm>
    </role>
    </name>

    ...

    <name type="personal">
    <namePart type="family">American Academy of Arts and Sciences</namePart>
    <namePart type="given"/>
    <role>
    <roleTerm type="code" authority="marcrelator">aut</roleTerm>
    </role>
    </name>

    <name type="personal">
    <namePart type="family">American Institute of Physics</namePart>
    <namePart type="given"/>
    <role>
    <roleTerm type="code" authority="marcrelator">aut</roleTerm>
    </role>
    </name>


    The 111/711 fields should map to <name type="conference">, while the 110/710 fields should map to <name type="corporate">. The only way that can happen through Zotero, I would think, would be with the assignment of appropriate author tags, distinguishing among personal, corporate, and conference.
  • I've been saying for quite awhile that name and date handling need further work in Zotero.

    Admitting that the problem is really difficult, I think the best solution is something like Microsoft's solution in Word 2007: contributors get a single field, always. One indicates has a selector for groups and/or organizations. That switch not only identifies the agent as such for export and such, but also turns off personal name parsing. One then enters personal names in proper sort order ("Doe, III, Jane" or "Mao Zedong") and Zotero gives some hint (maybe in a tooltip?) of how it is processing the name.

    It's simple, it's international-friendly, and it also lends itself better to auto-complete of already existing contributors.

    Finally, the same mechanism can be used for structured dates.
  • Oh, and I'll add that I really dislike library terminology like "corporate body." Sure, use it for export to MODS, but don't burden your users with it. We're talking groups and organizations.
  • We dislike the all-contributors-get-a-single-field approach because we think it's unnecessarily awkward for the user--the software should be smart enough to help out more than that, since it'll have to know some metadata about the field format anyway to parse/ignore it properly. We also prefer the semantic nature of two fields, especially given SQLite's lack of SUBSTRING_INDEX() or CHARINDEX() functions (and mozStorage's current lack of support for user-defined functions)--with a single field, we'd need to pull data into JS to get the last name part of a "last, first" name (for example, to display the last name alone as an option in autocomplete).

    It is true, however, that the current single/double field toggle doesn't indicate organizational status for export. One approach we're considering is to change the current single/double toggle button to a menu with, say, "Personal (two fields)," "Personal (one field)," "Organization," and "Conference." When collapsed, the button would indicate the current mode with an icon (rather than display the opposite mode, as it does now). Could probably implement it as a split button à la MS Word to allow a quick cycle-through as well as a menu.

    We're also planning to change it to default to the last-used mode, rather than always defaulting to two fields.

    Bruce: most of the new field names (e.g. "corporate body") were added right before release and definitely need to be fine-tuned a bit.
  • Dan -- the two-field approach is only simpler for some author names, and while auto-completing on last name might be something, what happens if I have three Smiths? I have to first auto-complete the last name and then the first?

    Perhaps there's a reason almost all bibliographic applications use a single field for names?
  • edited October 12, 2006
    Well, the two-field approach is only simpler for some author names, true, but you'd only use it for some—if there's a toggle, and it defaults to the last-used mode (which it doesn't now but will), how is that privileging one over the other?

    As for multiple Smiths, the current autocomplete implementation will display "Smith", "Smith, John", and "Smith, Jane" as options if you start to type "Smi" into the last name field, and if you choose one of the full names, it will (or at least should, though it might be a bit buggy at the moment) automatically put the parts in their respective fields. It'll also do the same if you start typing in the first name field, which wouldn't really be possible otherwise. I'll admit that this is a bit of a hack and currently has a few issues (among them the width of the field limiting the autocomplete menu width), but we believe it does (or should, when it's not buggy) offer the same—or greater—functionality.

    (Note that autocomplete currently excludes creators that already exist in the current item (which it probably shouldn't, since a creator could exist in multiple roles in the same item), so this behavior might not be immediately apparent while testing it.)

    We're not utterly opposed to the single field plus organization toggle approach, but we're throwing this out here to get a sense of the advantages and disadvantages of each. At the moment it's not clear—to me, at least—what we lose with the approach proposed above, whereas having a user type "Doe, III, Jane" into a single field seems less than ideal, even if it is required by the UIs of other apps.
  • Hi Dan -- totally reasonable. Like I say, I realize this is difficult. I'm also throwing out a strong opinion, not so much because I'm sure it's right, but just to have it out there. I think these details are really important, and so they bear deep discussion of clear options.

    It sounds like your auto-complete approach might work.

    Defaulting to previous view is fine also (I missed that you do that), though can I enter names using the "Doe, Jane" approach such that Zotero will properly parse them?

    Also, you *are* privileging Western names because you have one structured UI for first/last names (read Western), and an unstructured one for all else. Contrast this with how you deal with dates, where you have one single field to deal with structured dates. There you actually make things more difficult for 99% of dates.

    I'd prefer you to have a single approach to structured data of this sort that worked well for all cases.

    WRT to what you lose, maybe nothing. I'd strongly suggest, however, that you get feedback from scholars who work in East Asia. Find a Japanese or Chinese historian, for example (preferably one who currently does not use any bibliographic software), and sit down with them in front of Zotero, and have them explain to you what they think about the current approach. Maybe they love it, and there's no problem. Maybe they don't and can tell you why.

    My hunch is they don't and they'd prefer my single field approach, but I admit I could be wrong. The only way to find out to get it straight from the horses' mouths.
  • BTW, on the dates comment, I like how you're now handling the "accessed" field; just want that applied also to "date."
Sign In or Register to comment.