Library of Congress error

vardiss · December 22, 2008

When I Zotero a Library of Congress book with multiple volumes, the number of volumes appears in the page number field instead of the # of volumes field.

Tjowens · December 22, 2008

Issue confirmed. Ticket created. https://www.zotero.org/trac/ticket/1271

LGauvreau · June 24, 2009

Hi,

I just noticed that Library Congress's translator don't grab all (the) keywords:

Exemple for book 1, keywords should be:

Marie de la Trinité, sœur, 1874-1944.
Thérèse, de Lisieux, Saint, 1873-1897.
Carmelite Nuns --France --Biography.

But Zotero grab only:

Marie de la Trinité
Thérèse

Book 2 should be:

Subject (in one line): Thérèse, de Lisieux, Saint, 1873-1897 --Criticism and interpretation.

Zotero grab in 2 lines with missing elements:

Criticism and interpretation
Thérèse

So, there is a problem with "parsers" in keywords: , and --

I don't like to say that, because I prefer Zotero, but Endnotes is better in these cases, it grab everything at the right place + the field "NOTES" wich are very usefull, and sometime essential for old books. For the Book 1, NOtes containt:

présentés par Pierre Descouvemont.
ill. ; 20 cm.
Epiphanie,
Spine title: Sœur Marie de la Trinité, une novice de Sainte Thérèse.
Includes bibliographical references.

Thanks for your help,

Luc

noksagt · June 24, 2009

It'd be more useful if you provided links to your references.
Book 1 permalink
Book 2 permalink

Note that the MARC 600 field is split. I think Zotero does the right thing & chooses only the most notable parts of this field to use as tags. This makes them short (long tags cause headaches), meaningful, and would allow tags to apply to more than one entry (if it grabbed the entire field as a long tag, there are only 1-2 books in the entire LoC which would be tagged that way).

noksagt · June 24, 2009

Ticket updated for notes:
https://www.zotero.org/trac/ticket/1387

LGauvreau · June 24, 2009

I didn't think about the permalink: thanks.

I agree in general with your comment on long tag, but there is long and long. In these 2 exemples, they are not very long. And sometime, you can't really abreged them because you lost key informations, like for person's names in these exemple:
Marie de la Trinité, sœur, 1874-1944.
Thérèse, de Lisieux, Saint, 1873-1897.

Zotero grab only:
Marie de la Trinité
Thérèse
It's useless: "Thérèse" in french it likes Mary in english!

But, it's more logic as you said to split: Carmelite Nuns --France --Biography like that

Carmelite Nuns
France
Biography

Note that in LoC: « , » is use differently that « - - » // , indicate a unique keyword in 2-3 elements, and -- is used in standard subject keywords. Can Zotero see the difference?
Thanks,
Luc

noksagt · June 25, 2009

It's useless: "Thérèse" in french it likes Mary in english!

I don't know if it is "useless": most Thérèses and Marys will have family names & the absence of a family name would probably imply the tag applies to a more limited subset (e.g. the saint). Also, a tag that could only ever apply to one item is at least as bad.

Can Zotero see the difference?

Zotero uses the MARC records for the LoC Voyager catalog. Human-readable:

600 	00 |a Marie de la Trinité, |c sœur, |d 1874-1944.
600 	00 |a Thérèse, |c de Lisieux, Saint, |d 1873-1897.
610 	20 |a Carmelite Nuns |z France |x Biography.

In the specific case of people, it might make sense to use the 'c' sub-field for people (titles and other words associated with a name). In this particular example, it would help. I don't know what other records look like, though. I would think the dates would do more harm than good in a majority of cases & they are certainly not needed here: there is only one St. Teresa of Lisieux.

LGauvreau · June 25, 2009

I appreciate your info about Marc's records. I can understand better how the translator work.

What I would like to say is if you keep just Mary or Thérèse as tag, you will find thousands and thousands of them, wich is really improductive... But, who knows? for someone who like to study the historical distribution of Mary and Thérèse through centuries it could be perfect! But Lisieux, saint and date are need it if you are loking for this unique Thérèse and not is thousands of homonymes.

1) But, is it the role of Zotero to decide which information is "usefull" or "useless"? I don't think so: Zotero should grab all bibliographic informations "as it is", or modify some minor way to represents it to be more effecient, but not more.

Your affirmation that "a tag that could only ever apply to one item is at least as bad" is strange... I don't know if I understand well, but it's very usefull that Washington (D.C.) refer to just one city, not a state, not a person. Some words-expression mean a singular thing, like person's names (what we call "named entities" I think), others are concepts, for that , I agree with you, a "conceptual keywords" wich apply to just one thing is a non sense. Tagging "Felix the Cat" and cat and Felix is really not the same: all are usefull. And, I think, it's not a problem, because if you search « cat » in keyword's field with Zotero, you will find all the cats + Felix the Cat; if you search Felix you will fin all Felix + Felix the Cat: that is usefull for every body. Search engine can extract and find each elements of a "expression": it's perfect.

YOU know that there is just one Thérèse de Lisieux, but Zotero as a machine can't know that. Do you know how many "Michel Tremblay" are searchable in Quebec? All information, and date too, could be very interesting to search: 1) and finding a date is not easy, if LoC give it, Zotero must grab it.

I'm just doing a search and I need exactly this kind of date: I study the production of french biography in XiX century. With the date of birth and death, I can study wich generation of people have been more subject of a biographies than an other one, and bios published how many years after their death? We can notice than now, in XXI, biographies are published more and more when the people still living! It was really not the case in the XIX century...

Anyway, I keep my idea that the role of Zotero is to grab the informations "as it is", and not choice wich element of the bibliographic infos is usefull or useless...

Thanks,

Luc

noksagt · June 25, 2009

Do keep in mind that tags can be used differently than MARC subjects. One should be expected to type tags (meaning they should be short and memorable). One can select a tag in the list to see what records have that tag (meaning that they should be reused in records).

What I would like to say is if you keep just Mary or Thérèse as tag, you will find thousands and thousands of them, wich is really improductive...

I agree that, if that occurred, it would be inconvenient. But I'm saying it is unlikely to occur. Most will be tagged with the surname as well (e.g. "Thérèse Defarge"). And most people do not study a wide variety of people with identical "canonical" names.

But, who knows? for someone who like to study the historical distribution of Mary and Thérèse through centuries it could be perfect!

This is a corner case at best. And longer tags wouldn't really help with this anyway. It is reasonable to expect that such a rare and specific case would need other customizations to Zotero.

But, is it the role of Zotero to decide which information is "usefull" or "useless"?

Yes. There are many threads in the forums from people who complain about the automatic tagging & these complaints are mostly about overly-long tags.

Zotero should grab all bibliographic informations "as it is"

No reference manager does this, as the data model of the software cannot possibly replicate the disparate input. Both Zotero and EndNote lose the semantic information about the subject contained in the MARC records (e.g. the MARC record indicates not only that this subject is a person, but also separates name, title, and dates). This may be useful information, but researchers wanting to do a detailed study that depends on this information would be well-advised to use or make software that could work with this and/or to work directly with databases that stored this richer information.

Your affirmation that "a tag that could only ever apply to one item is at least as bad" is strange...

Searching for "Marie de la Trinité, sœur, 1874-1944." in LoC reveals only two sources. Searching for "Marie de la Trinité" reveals many, many more sources that describe the same person. Tags selected by folksonomy are long enough to be fairly specific (preventing false positives), but not so long as to identify a single item exclusively (preventing false negatives).

LGauvreau · June 25, 2009

Many ideas and things to discuss... For a better understanding and to be clear: I make a difference between
1) keywords provided by libraries catalogues wich are written respecting rules; 2) personnal tags, freely used or with some personnal rules.
Before, 1 was the only one we use and the only system, now we can use both: each one has is values and limits. Just try to make it collaborative...
1 = is rigid, but keep (or try to keep) a kind of order and standard classification of knowledge (via Dewey or Congress)
2 = is an open system of "classification", free, but it's really not easy to connect semantically personnal clouds of tags written by many users. So, to keep in "background" the standard system of keywords is a good idea, for me.

Also, there is an other point in your arguments: long keywords versus short tag (people, has you said, don't write long personnal tag).

Long keyword it's a real problem: I agree with you. But is it a technical problem to grab it in a field of the Zotero database? In sync, there is (were?) a bug about that. But the problem was occur because some translators put all the keywords in just one extra-long and crazy keyword.

The alternative for me is, if the standard keywords is to "long": 1) can Zotero "scrap" some infos to have a shorter keywords; OR 2) parse semantic elements of the long keywords in different smaller keywords? 2 seem the right thing to do...

Return to our first exemple:
600 00 |a Thérèse, |c de Lisieux, Saint, |d 1873-1897.
20 |a Carmelite Nuns |z France |x Biography.

All expressions are searchable in catalogue: Lisieux return more than 500 entries, that way you can find al items about the French city; saint, return probably millions of items. But try this search: SUBJECT = biography AND 1873 , return 19 items about people born or dye in 1873.
But much more interesting, SUBJECT = 1873 return 6283 items about people born or death in 1873!!!

Anyway, we can discuss what is a usefull or useless information, it will be a personnal point of view. The real question is: wich rules Zotero should follow the grab infos in standards keywords offered by online catalogue? In science for example, the vocabulary is so important than Zotero must keep the info as it shown...

But, I keep in mind the limits of Zotero and Endnotes to do it, as you said. I'm not familiar with these limits.

Last thing, everybody are talking about structured datas, because on the web, and folksonomy is not the way to resolve all this problem, infos are very un-structured. Save the standard keywords is a way to have a minimum of structure in our huge amount of datas.

What I like with Zotero is that I can re-organise the keywords I got from catalogues. As in 1873 search, I probably can add my 2 personnal tags, Date_of_death and Date_of birth, playing with advanced search. With geo-infos about the people, I can probably put in on GoogleMaps with some gadgets to have the map of biographies, or some thing like that.
Structured and authoritative infos as we can find at LoC, for exemple, are very important in chaotic web.. I like it.

Thanks,

Luc

LGauvreau · June 25, 2009

More technical remarks about LoC:
LoC offer diffrent type of view: Brief, Subject/Content, Full Record, Marc (and others on permalink page: Mods, Coins, etc.). I note that is just when we choice PERMALINK view page that we got all the informations. With Brief, Subject/Content and Marc views, some informations are missing. So, users won't probably know that. Can Zotero translator use the PermalinkPage to grab informations even the user haven't display it?

Permalink page save all informations, but some infos seem absent or in the wrong field:
Exemple from this item: http://lccn.loc.gov/75321882
language: empty field?
In Main Ttitle: The Juneberry tree : a novel / by Jacques Ferron ; translated by Raymond Y. Chamberlain. Info in bold is in NOTES field, but it 'll be more logic, easy access, if it was in EDITION field, no?

I don't know if you can put "Raymond Y. Chamberlain" as "collaborator" if it's not as "translator"

It's an impression, but I note that many Zotero translators put the infomations written after a TITLE (ex.: dir. by, translated by, preface by, etc.) usually in NOTES field. It's better to put these infos in EDITION field, I think.

But, my first remark is more important.

Thanks,

noksagt · June 25, 2009

I note that is just when we choice PERMALINK view page that we got all the informations. With Brief, Subject/Content and Marc views, some informations are missing.

The human-readable permalink at lccn.loc.gov contains the same information that the "full record" human-readable version on catalog.loc.gov (less copy/request/status information).

Zotero translates the pages differently.

catalog.loc.gov uses Voyager & Zotero uses the MARC record associated with the entry (regardless of which "view" of the record you use). Contrary to your assertion, I do not believe that the MARC record is missing any information.

lccn.loc.gov is a custom service. Zotero translates from unAPI/MODS XML.

The underlying MODS XML and the MARC data both contain all relevant information.

The MODS XML and MARC translators within Zotero miss different details from the record. Ideally, both of these two translators would be improved to retain more information.

noksagt · June 25, 2009

language: empty field?

This ticket should be reopened or another should be made that addresses coded language terms.

In Main Ttitle: The Juneberry tree : a novel / by Jacques Ferron ; translated by Raymond Y. Chamberlain. Info in bold is in NOTES field, but it 'll be more logic, easy access, if it was in EDITION field, no?

That phrase is not part of the title of the book & has only been concatenated into the main title field on the human-readable page. I disagree that "edition" would be appropriate (which should be reserved, where possible, to uniquely identify an edition of a work (typically using a number). The MODS for this information is:

<note type="statement of responsibility">
by Jacques Ferron ; translated by Raymond Y. Chamberlain.
</note>

Making it a note is consistent with the source data.

I don't know if you can put "Raymond Y. Chamberlain" as "collaborator" if it's not as "translator"

Importing can't improve much on the underlying source data. MODS XML does allow you to identify translators. But this particular LoC record does not take advantage of that.