Hierarchical Item Relationships

2
  • A couple of questions re: Josh & Sean's schema. First, how does this model address the fact that in many cases a "reproduced in"-type item (such as a collection of documents) is also a specific document in its own right (such as a book, or journal article, or blog, etc.)? Will the "reproduced in" designation simply point to the original document record for the book, article, etc.? And does the distinction between regular and "ancillary" types complicate this at all (since some "reproduced in" types are ancillary, and some are not)?
  • edited February 8, 2007
    Thanks Josh and Sean for working on this! Like Bruce I have problems with the distinction between ancillary item types and actual item types. There are use cases where it would be handy to enter the ancillary item types as actual items (i.e. stand-alone database entries). So I would simply take away this distinction and treat all item types as equal, but then allow to relate these item types freely with each other.

    Also, I must admit that, as a third-party developer, I'd like to agree upon a hierarchical model that would not only work for Zotero but also for other bibliographic applications.

    That said, and not thinking of any Zotero implementation details, please allow me to go back to a more conceptual model for brainstorming purposes. Personally, it helps me to think of relationships like this:

    First of all, all items are on the same level and can be *freely related* with each other (this is very important if the model wants to address all different kind of needs).

    Speaking of relationships, I think of "classes", "subclasses", "items" and "item properties" (more about properties below).

    In case of "classes", I'm thinking of the main basic elements that occur in every bibliographic citation/reference. "Subclasses" are contained within these basic classes and would usually work as fallback elements when generating citations. "Items" would be contained within subclasses and would (in case of "resources" & "events") represent the actual database entries (think Zotero item types).

    So, following this markup scheme:

    class:
    - subclass: item1, item2, item3, ...

    I think of these classes, subclasses and items:

    Agents:
    - Person: author, editor, translator, inventor, contributor, recipient, ...
    - Organization: publisher, authority, ...
    - ...

    Ressources:
    - Collection: periodical, series, archive, internet site, proceeding, ...
    - Document: article, book, section, chapter, thesis, statute, map, image, blog, ...
    - Communication: letter, email, instant message, interview, ...
    - Broadcast: radio, television, podcast, ...
    - ...

    Events:
    - Conference
    - Legal Case: brief, decision
    - Hearing
    - Expedition
    - ...

    Places:
    - Country
    - City
    - ...

    Dates:
    - Year
    - Month
    - Day
    - ...

    Note that there is no dedicated "Collections" class, since collections are IMHO simply a resource that can contain other resources, so basically collections are resources as well. Therefore, I'd consider them as a subclass of the "Resources" class. Take a book as an example - would this be a collection or a (document) resource? The answer is, of course: it depends. A book can be considered as a stand-alone document as well as a container of book chapters -- or, as a container of book sections which in turn are containers of book chapters. I.e., depending on the situation, a book can be considered as a collection or as a document, but it's in any case a resource.

    Speaking of "items", all items could have multiple "properties", such as "title", "name", "language", "locator", "identifier" or "descriptor". A qualifier (i.e. the elements after the colon below) could specify in more detail the nature of that property, e.g.:

    - name: given, family, suffix, display, sort, ...
    - title: long, short, abbreviated, translated, alternate, descriptive, ...
    - language: fulltext, summary, ...
    - locator: volume, issue, pages, edition, code, patent number, ...
    - identifier: url, doi, issn, isbn, pmid, lccn, archive id, local call number, ...
    - descriptor: keyword/tag, category/group, ...
    - ...

    In addition, a universal "type" property could always specify in more detail the actual nature of the item. For example, a periodical could have one of these type properties:

    - journal
    - court reporter
    - magazine
    - newspaper
    - ...

    Or a thesis item could have one of following type properties:

    - dissertation
    - master thesis
    - bachelor thesis
    - ...

    And all place and date items could have a type property such as:

    - published
    - presented
    - sent
    - received
    - ...

    In theory, all items from all classes could be related freely with each other to form a citation that suits the user's needs.

    However more practically, the software could provide "relationship templates" for typical use cases (say, a letter within an archive, or a book within a series, etc). This would be the equivalent to Zotero's current item types. The user would select one of these templates from a dropdown menu. Zotero would prefill the edit mask with all necessary fields but would visually distinguish between item-specific and container-specific fields. As discussed by others ealier in this thread, the contents of container-specific fields would make a separate database record.

    Note that it should be also possible to nest a container within a higher-level container. This would be required to properly cite e.g. a book chapter within a book within a book series.

    I hope that my thoughts make sense to you.
  • Bruce said:
    "we need to decide the policies for when something becomes a formalized 'type.'"

    As per my previous post, I'd prefer if item types would merely be a pre-made set of relationships. I.e. Zotero would offer the most used citation cases as named sets of relationships (as Zotero does now via its item types). Choosing one of these sets from a dropdown menu, would fill the edit mask accordingly.

    Ideally, these relationships (and the corresponding GUI layouts) would be established on the basis of some type definition files (I guess RDF?) defining all the relationships. While Zotero wouldn't offer a GUI interface to change its own pre-made type sets (or to make new ones), power users could go ahead and edit the underlying RDF files to modify existing type sets or to create new ones. Then, offer a repository where these type definition files could be shared within the community. These would be a very flexible and powerful setup that would be easily adjustable in the future.

    "I suggest we focus on a smart core with some hierarchy, and leave room for customization with a 'genre' or 'type' column/property."

    How would the data model that I outlined above, fit your imagined model? (I've tried to include much of your biblio scheme and posts)
  • edited February 8, 2007
    CoD- "Reproduced in" is not an item type. It's just a UI element pointing at a back-end association between one item and an another. Functionally, this means that before entering a letter, you'll first want to enter the book where the letter is reproduced. Then letter item details, and click "Reproduced in" to point back to the parent item.

    "Reproduced in" should never point to an ancillary type, since a journal article is not likely to be "reproduced in" an issue. That's its original form.
  • edited February 8, 2007
    Matthias- Josh and I agree completely on re: typing items to reduce the total number of item types. As you note, periodicals could be collapsed, as could be dictionary/encyclopedia, etc.

    As for "ancillary" items, we think it's important not to allow the creation of a "periodical" or a "periodical issue" because doing so will severely complicate the UI. At the end of the day, we need to provide an interface that meets the needs of the vast majority of users. These people don't cite a journal or an issue. They cite articles in the issue of the periodical. Same with book series, etc. By separating these items out, it will be very easy to add and cite these kinds of sources, since we can auto-populate or provide some kind of selection UI for existing journals, issues, series.

    I understand the attraction of a perfect model, but it's also important to keep usability front and center. How important is the use-case for citing a single archive, and how much development time are you willing to invest to make it happen? What kind of usability trade-offs are you willing to make?
  • On the question of "ancillary" vs. other item types, I made a mistake in not being clear - this is a UI distinction, rather than an ontological one in the data model. From the data perspective, there's no difference between a journal issue and an article; the difference is that in the Zotero interface, you'd never be able to create a journal issue on its own (we'd be hacking in a bunch of UI decisions like this in order to make the software more accessible to people who don't think about this stuff as much as we all do).

    On the "collection" type, Matthias hit it on the head when he said above: "collections are IMHO simply a resource that can contain other resources, so basically collections are resources as well." That's how we see it, and the "collection" type seems an unnecessary elaboration.
  • I'll also add one more point - I see the evolving taxonomy on the Trac page as Zotero-specific; as I see it, the primary agenda is to come up with something that we can implement in Zotero as soon as a few weeks from now, and to do so in a way that doesn't preclude compatibility with more universal standards (hence the "ancillary item types" thing)...

    So, my personal concerns are naturally more specific and seemingly-limited than one would need in a universal ontology; I'd like to have my cake and eat it too, but given constraints it seems that the best we might be able to do is have our cake and keep the option of eating it open for the future. The great thing is that Zotero can spit out an RDF in whatever more generalized format we want later on, while still maintaining a more tailored and domain-specific model inside its black box.

    Regardless, this is clearly a useful discussion to have (both in the concrete and more abstract forms), so let's continue.

    One question I'd put out: any ideas on what to do with what we called a "Communication"? It seemed that this was fundamentally a different thing than a "Document", in a sort of broadcast vs. interpersonal communication way; any thoughts on this?
  • "On the "collection" type, Matthias hit it on the head when he said above: "collections are IMHO simply a resource that can contain other resources, so basically collections are resources as well." That's how we see it, and the "collection" type seems an unnecessary elaboration."

    But the notion of collection is significant both WRT to the GUI (as you both mention), the citation formatting, and data exchange. If you were to change "ancillary types" to collection and be more precise about it, would you really be deviating from your intentions?

    FWIW, I created the main collection class becuase that notion is commonly used in bilbiographic data, and because it groups together a number of similar structures: series, archival collections, tv and radio shows, web sites, etc.
  • "One question I'd put out: any ideas on what to do with what we called a "Communication"? It seemed that this was fundamentally a different thing than a "Document", in a sort of broadcast vs. interpersonal communication way; any thoughts on this?"

    Part of my long-post-that-got-eaten did address this. I had thought about this earlier too, but it's a little tricky. For example, how would you deal with an interview published in a book, or broadcast on the internet?
  • Matthias --

    "As per my previous post, I'd prefer if item types would merely be a pre-made set of relationships."

    Yes. Well, look at how I do it in CSL. I have a convention that says you concatenate the primary level type with its container type. So you can do "article-periodical" or "article-magazine" and so forth. I could imagine the same for these assembled GUI types.
  • Bruce --

    "Reproduced in" a book or a web page is what you would do with your hypothetical interview.

    That said, in our model a "Communication" is really just a "Document" with the addition of a specified recipient or recipients. So maybe we should make it one.
  • Please note that I didn't ditch the "collection" hierarchy entirely (I consider it useful), I was just emphasizing that a "collection" can also be a valid resource (similar to an article, letter, etc). Therefore, all of its members should be treated equal compared to members of the "document" tree (i.e. all these elements being "items" in my view).

    Josh, I understand (and like) the idea of a communication being different from a document. Also, I think that a broadcoast (one to many) is different from a communication (one to one).

    Bruce said:
    "For example, how would you deal with an interview published in a book, or broadcast on the internet?"

    This is a valid point and it highlights my point of a book being a "collection" OR a "document" depending on the situation. With this in mind, I'd argue that the relationships between items and its higher-level categories (subclasses) should not be dogmatically fixed. Why shouldn't it be possible to relate an interview to a book in one instance and to relate it to an internet broadcast in another instance?

    How about if we'd merely regard subclasses as some means to provide meaning between items and classes, i.e. for example, a book could belong to subclass "collection" OR subclass "document" depending on the situation? (just thinking out loud here) Would this be feasable in an RDF model?
  • Matthias: "I'd argue that the relationships between items and its higher-level categories (subclasses) should not be dogmatically fixed. Why shouldn't it be possible to relate an interview to a book in one instance and to relate it to an internet broadcast in another instance?"

    It should be possible, just as it should be possible to relate a letter to a book, or a journal, or an article in a journal, or a reel of microfilm, or a website, or an article in a journal on a website, or just a box in an archive. Clearly, one point of a hierarchical model is to make possible all kinds of relationships, including the ones that we don't currently anticipate needing.

    That said, it sounds to me like the model posted on the wiki could handle this pretty well. The key is that everything is a type, and can be related in various ways to other types. The most complicated documents I have to cite are a bunch of letters appended to an annotated diary published as a titled article in a journal. So I create a new entry for a letter, note that it's "reproduced in" the article, which is found in the journal, etc. Works for me.

    I do have another question about the model, however:

    Why the distinction between "reproduced in"-type parents and original parents? What is the functional difference between the relationship between a letter and a book and the relationship between a letter and an archive? Or an article's relationship to the journal that first published it vs. a book in which it was reprinted? I don't see what difference this makes either for the data structure or the UI (especially for the UI - the average user entering a letter/book in a database only cares about recording the information for that particular incarnation of the letter, and not about the fact that the original letter exists somewhere else). In each case, we're still just talking about types relating to types, right?
  • edited February 8, 2007
    On the "reproduced in" idea, it seems to me that the relationship between a journal article and its parent "Journal" item is different from that of a journal article and a book in which it's reproduced; this might just come down to a Benjamin-like notion of the "aura" of originality, but in every case we could come up with, there is an original and authentic item (of course, we're thinking as historians, so that might also play a role here). It seems useful to avoid conflating an authentic original with its reproductions - we cite them differently, and treat them differently in the context of research.

    As for "collections", if a book can be either a collection or a document depending solely on perspective, then it fundamentally can't be described in an objective way (which is crucial to data portability and interoperability). Since one could *always* make a gestalt shift between collection/document, I don't see the usefulness of maintaining them as somehow ontologically distinct. Why not just level the distinction between them, and simply talk about items that can also be pieces of other items (and in turn be composed of smaller items as well)?
  • It might make sense to disentable version relations from part-container relations, Josh. We also deal with republished versions of books a little differently. Going back to FRBR, BTW, it provides a useful way to think of these distinctions (works vs. expressions vs. manifestations vs. items).

    On collections, I don't really agree with Matthias that a book can be either a document or a collection. OK, yes, I do see what he's saying. But I am seeing this as a distinction between a standalone item, and a set of them. One never cites the latter. So an edited book would be a subclass of a document, and its series would be a collection. Likewise, a multi-volume book would also be a collection I guess.

    The reason why not to "level the distinction" in the exchange and formatting system (CSL) is that I *think* it's important. In CSL, for example, I have a relation attribute for things like titles. It helps there to be able to say use a container title here, and a collection title there.

    In any case, I'm not really religious about my position; it's worth discussing more. I've just always had a separate collection class/table/etc.

    To Matthias question on RDF subclassing, yes, a class can subclass multiple parent classes.
  • I've just had a long post erased by a connection blip. A pox on fuzzy wireless connections.

    I had two points, I think: first, re: Josh's comment: "It seems useful to avoid conflating an authentic original with its reproductions - we cite them differently, and treat them differently in the context of research."

    We cite them differently? The citation gives the information for the item and tells the reader where to find it. I don't see how it makes any difference to the bibliographic software whether the "container" is the original or a reproduction.

    Second, at some points in this discussion we've misunderstood each other because distinctions between the data structure, the GUI, and hierarchies of entry types aren't clear. It seems to me that these are three quite different (though obviously interrelated) things. My dearly departed post spelled out how I saw the differences among them, but I won't try to recreate that. The key thing is that there seems to be widespread support for the idea of a hierarchy-free data structure - records are records, and books can be parents, children, or stand-alone items, depending on how they're related to other records in a citation. The framework for organizing citation styles and entry types is more rigidly hierarchical - the category "communication" contains sub-categories "letter," "email message," etc. These hierarchies of types will help expedite the creation of new citation styles and entry types, and eventually make it possible to open these up to user customization without devolving into utter chaos. The GUI will also be hierarchical, but in a different sense - it will display the hierarchical relationships between items and their parents (and potentially grandparents).

    I hope that's clear - a more carefully composed version is gone forever.
  • edited February 9, 2007
    In response to this point from Sean way above:

    "As for "ancillary" items, we think it's important not to allow the creation of a "periodical" or a "periodical issue" because doing so will severely complicate the UI. At the end of the day, we need to provide an interface that meets the needs of the vast majority of users. These people don't cite a journal or an issue. They cite articles in the issue of the periodical."

    It is not clear to me that "periodical" or "archive" are not useful as item types. Yes, people don't cite periodicals or archives in footnotes, but then Zotero is a research tool rather than just a bibliographic citation tool. I can imagine someone doing research on, say, Dwight MacDonald, and amassing a list of archival collections where his letters or relevant documents are located. Or, one might work on a history of audio engineering from the seventeenth century to the present and compile a list of relevant periodicals to be perused. Eventually, one may need to export this list into a bibliography--many dissertation and book bibliographies include sections for "Archives" and "Periodicals". Why not make it easy to import periodicals from online library catalogs, or online archival finding aids (such as this one), into Zotero?
  • edited February 9, 2007
    CoD: "We cite them differently? The citation gives the information for the item and tells the reader where to find it. I don't see how it makes any difference to the bibliographic software whether the "container" is the original or a reproduction."

    When I'm citing an artwork, I do need to indicate in the citation whether I'm working off of the original or a reproduction; this is part of how I lay out an evidentiary chain for my argument. The question of reproduction *does* matter, because if I'm not working off of the original, errors or noise might have been introduced in the process of reproduction, and a later reader needs to be able to track down the particular reproduction on which I based my claims (hence the need for a particular kind of "Reproduced in" relationship)

    CoD, on your second point, I think I agree: in principle, the more flexible the model the better, but the implementation of these concepts in a particular piece of software tailored to a particular set of practices (i.e. Zotero) will hard-code much more rigid and explicit hierarchical relationships into both Zotero's UI and internal data model (neither of which preclude exporting said data in the more abstract and generic RDF form as needed). Because my first concern is Zotero (with the broader utility of Zotero data in other contexts a close second), I tend to slip back into the latter two modes, rather than instinctively staying at the more generalized level.
  • edited February 9, 2007
    I agree wholeheartedly with Sean & Josh that one has to draw a line somewhere between an ideal model and the reality of implementation, and I'll face the same trouble when trying to implement such a hierarchical model in my own bibliographic application. However, my point is that it doesn't make sense to adopt a new model (which takes *a lot* of time to implement!) if it's again fairly limiting. Sure, it's always a tradeoff, but I really think that a hierarchical model should:

    - allow me to freely relate any items with any other items, or, with Josh's words: "items that can also be pieces of other items (and in turn be composed of smaller items as well)"

    - allow me to cite any kind of container by its own. Sure it's a less common case but there are many cases (even in hard sciences) where you need to cite an entire book as well as some chapters from the same book within the same work. And I imagine that some people (such as an editor in a preface) will definitively need to cite a book series by its own. So I agree with erazlogo here that from a user point of view it should be possible to cite a container by its own.

    Bruce said:
    "On collections, I don't really agree with Matthias that a book can be either a document or a collection. OK, yes, I do see what he's saying. But I am seeing this as a distinction between a standalone item, and a set of them. One never cites the latter. So an edited book would be a subclass of a document, and its series would be a collection. Likewise, a multi-volume book would also be a collection I guess."

    Maybe my confusion is that I've always viewed a "container" as being a synonym for a "collection", but in case of books, it's more like that a book can be a container for something while not being a collection in the sense that it's a set of multiple items that are usually physically distinct from each other.

    If we regard collections as a set of multiple things (so a standalone book being a document), how about if the characteristic of "being a container" would simply be a property that could be assigned to a relationship.

    The same logic could be applied to the distinction of "original" vs "reproduction": I understand the usefulness of this distinction, but it's only one of many useful relationships. So couldn't this be just another property of the relationship between two items?

    In other words: there are many other useful relationships besides "reproduced in", e.g. "presented at" comes to mind. Btw, w.r.t. "reproduced in", I would also favour a more general relationship such as "contained within". Even better, wouldn't it be better if we'd simply could establish *any* kind of relationship between two items? Zotero could offer a dropdown menu with a "relationship qualifier" such as "contained within", "reproduced in", "presented at", etc? Would this be possible? Also, this would make it rather easy to expand such a system in the future to add a new type of relationship.
  • Mathias said: "In other words: there are many other useful relationships besides "reproduced in", e.g. "presented at" comes to mind. Btw, w.r.t. "reproduced in", I would also favour a more general relationship such as "contained within". Even better, wouldn't it be better if we'd simply could establish *any* kind of relationship between two items? Zotero could offer a dropdown menu with a "relationship qualifier" such as "contained within", "reproduced in", "presented at", etc?"

    Yes, interesting. As I have thoght of it, the key relations are part of/contained in, version of (to denote relations to original versions), presented at.

    As for collections being perhaps more about a relation, I'm not sure; perhaps.

    There is the practical issue in a bib app of knowing how you display data. I would absolutely hate to see any app displaying all my collections in a table view, and I don't think it good practice to rely on hard-coded types to know how to handle each case.
  • Bruce said:
    "I would absolutely hate to see any app displaying all my collections in a table view"

    I understand this. However, as outlined above there are use cases for this, so I guess it depends on your needs whether you'd like to see a collection item as an individual item or not. IMHO it would make the inherent logic easier understandable for the user, since *all* items are treated equal and you could relate all items with each other using exactly the same way in the GUI. Also, editing would work in the same way for all item types which is beneficial. Why should I be allowed to edit some items in its own record entry mask, while other items can only be edited from within another item? This doesn't feel like a good UI concept to me. Also note that there are many possible ways to deal with the display of collections in the interface, e.g., the interface could hide collection items by default but display them within a special folder, "saved search", etc.

    Speaking of "collections" vs "containers", I think we should really define what these two words mean for us. I have the impression that for some people these two are interchangable words for the same thing while for others it's not. So I'd appreciate any clarification. Personally, I like Bruce's simple definition of a "collection" being a set of multiple items. I.e. a collection is always a container for something else. OTOH, a container is not necessarily a collection. The prominent example would be an edited book which is still a stand-alone item and as such regarded as a document. Would other people agree with this thinking?

    "As for collections being perhaps more about a relation, I'm not sure; perhaps."

    I have no problems with "collection" being a hierarchical concept, in fact I like it and I agree that the concept of a collection helps with things like citation formatting. But in my above post, I wasn't necessarily implying that a collection would merely be a "relationship qualifier", I was just saying that collection item types should be treated similar to document item types and that there are multiple useful relationships (with "reproduced in" being only one of them). If one regards a collection as a group of item types that comprise a set of multiple document items, the relationship could still be described by the container logic (i.e. "is part of", "is contained within" or "is reproduced in"). So I don't see any problems here.
  • edited February 9, 2007
    I'd like to stress again the importance of multiple relationships and multiple types of relationships. In the sense of FRBR and Barbara Tillett, here are some more examples for useful and valid relationships that (IMHO) a future research & bibliographic tool should be able to handle:

    is a reprint of
    is a facsimile of
    is a microform reproduction of
    is an exact reproduction of

    is a revised version of
    is a translation of
    is a subsequent edition of
    is an illustrated/abridged/expurgated edition of
    is a simultaneous publication of

    is a summary/abstract/digest of
    is a free translation of
    is a dramatization/novelization/screenplay of
    is a parody/imitation of
    is an adaption of

    is a casebook/review of
    is a commentary/criticism/evaluation of

    So, my point is probably: If the proposed hierarchical model can only account for *one* single type of relationship then it's not worth adopting that hierarchical model, IMHO. A new design should be more flexible than the old one, otherwise we could as well stay with a flat design.
  • Matthias,

    I also like the idea of building in a capacity for lots of different kinds of relationships, and many of the ones you mention make sense (though many of these would not matter for purposes of citations, there are other benefits).

    That said, it's also worth keeping in mind that many of these "relationships" are simply variations on the basic relationship "is in," where the medium (microform, facsimile, review, etc.) would be evident from the data or the entry type. If I am citing a document from a microfilm collection, it's self-evident that this is a microfilm reproduction of the original document. The specific relationship type is indicated by the entry type (in this case the container entry type) and therefore it would be superfluous to designate that relationship via a special "relationship type." So by all means, let's include all the relationships that will be useful - but let's not create more types than are necessary or useful.

    But perhaps it's worth stepping back and asking what we're trying to accomplish here. If we're interested in generating citations, then we don't need many relationship types - only those that are necessary for citation needs. If we're interested in creating a database that can quickly show us all the various incarnations of a particular source, then we could be talking about a lot more. Or we could do the first right now, because there's a more immediate need for citation management (without which you'll never get a big database filled with sources), while leaving open the possibility of creating a more open-ended relationship structure down the road.

    Another thought on this: the microfilm reproduction is obviously a microfilm reproduction because the medium is included in the citation info. Perhaps this could be a model for most of the others, as well. For entry types where relationship status is not obvious, you could insert tags within the record identifying the medium as "exact reproduction" or whatever. This could be a lot simpler than creating lots of "relationship types" and would accomplish the same purpose. For example, as anyone who has worked with manuscript letters knows, there are sent copies, retained copies, draft copies, additional copies made by the recipient, etc. In other words, lots of potential "relationships" needing specific types. But by simply designating these characteristics within each record, you could obviate the need for specific relationship types - you could simply create a generic relationship, and let the attributes of the records indicate the precise nature of that relationship.
  • Earlier, Josh wrote: "if a book can be either a collection or a document depending solely on perspective, then it fundamentally can't be described in an objective way (which is crucial to data portability and interoperability). Since one could *always* make a gestalt shift between collection/document, I don't see the usefulness of maintaining them as somehow ontologically distinct. Why not just level the distinction between them, and simply talk about items that can also be pieces of other items (and in turn be composed of smaller items as well)? "

    More recently, Josh wrote: "the more flexible the model the better, but the implementation of these concepts in a particular piece of software tailored to a particular set of practices (i.e. Zotero) will hard-code much more rigid and explicit hierarchical relationships into both Zotero's UI and internal data model (neither of which preclude exporting said data in the more abstract and generic RDF form as needed)."

    In the former, you seem to be calling for a very flexible model where items are items are items, with lots of possible relationships among them. In the latter, you note the need to hard-code more rigid hierarchies to make the whole thing operable.

    I assume there's a balance to be struck here between an ideal open-ended abstract model and hard-coding for the sake of functionality. I guess my question is, what kind of rigid and explicit hierarchical relationships will need to be hard-coded, and how might that hard-coding limit the potential for future features?

    Perhaps it would be helpful to move past the abstract question of ideal levels of flexibility, or how a container is different from a collection, and talk instead about concrete design choices that will have to be made. I know we addressed some of Dan's GUI questions earlier (though I doubt we settled anything conclusively). What are some other specific issues that arise in the implementation of this stuff?
  • CloudofDust, I agree with you that many of my previously mentioned relations may not be needed for purposes of citations. I presented them simply to show that one hard-coded relation may pose a serious limitation and that a redesign of the data model (which is the core of the app) should try to be flexible enough so that the model can be expanded easily to address future needs.

    CloudofDust is also right in that the types of the related items (plus any other field info) can account for many (if not all?) of these relations, though IMHO this pretty much brings us back to the current flat model with many item types. Current bibliographic apps do *exactly* this, they base the inherent hierarchic relations and the citation formatting on the item's type and draw logic from specific fields (such as medium, language, number of authors, editor present, etc).

    It's correct that many of the above given relations are variations of the same relationship type and I wasn't really implying that Zotero should offer all of these, just that the system should be expandable.

    To suggest something more concrete, I think that at least these basic relations would make sense for a bibliographic app:

    is a copy of (e.g. exact copy, reprint, facsimile,..)
    is reproduced in (e.g. microfilm, book, newspaper,..)
    is a version of (e.g. revised edition, translation, illustrated/abridged version,..)
    is a derivative work of (e.g. summary, screenplay, parody,..)
    is a descriptive work of (e.g. review, criticism, commentary,..)
    is presented at (e.g. conference,..)

    CloudofDust said: "Perhaps it would be helpful to move past the abstract question of ideal levels of flexibility, or how a container is different from a collection, and talk instead about concrete design choices that will have to be made."

    I understand this, however, you want to redesign a core element of your app (such as the data model) only *once*, so these discussions are important to get it right. IMHO the general discussion is really important since it's the basis for any concrete suggestions and user interface design. Nobody said this would be easy.
  • Matthias writes: "the types of the related items (plus any other field info) can account for many (if not all?) of these relations, though IMHO this pretty much brings us back to the current flat model with many item types. Current bibliographic apps do *exactly* this, they base the inherent hierarchic relations and the citation formatting on the item's type and draw logic from specific fields (such as medium, language, number of authors, editor present, etc)."

    The difference between what we're talking about and the current flat model is that our model will build relationships between parent/child items by relating separate records for each component, whereas the current model squeezes parent, child, etc. all into a single record, so that there are essentially no relationships between records at all.

    That's a pretty major difference that has nothing to do with how exactly our model defines relationships. So I don't see how inferring relationship types from item types (as in, container type = microfilm, ergo relationship type = microfilm reproduction) brings us back to the flat model. The key is that however we identify relationships, the new model is based on relationships among records, while the current model is not.

    I certainly agree with the need to redesign the data model only once, with an eye towards long-term as well as immediate needs. That said, I'm not sure what specific problems concerning the data model remain to be resolved. My understanding of what the Zotero folks are saying is that the data model will essentially consist of lots of records that can be related to one another, and that the specific nature of those relationships can be hammered out in the creation of entry types, citation styles, etc. Creating the types and styles involves figuring out how the data will be handled and how the user will interact with it, and can be worked out more or less independently from the data structure (in fact, one of the main points of having a relatively simple data model, where all records can be related to one another, is to maximize the possible ways of playing with the data down the road.

    That said, if there are specific issues concerning the data structure that need to be hammered out here, by all means let's talk about them. But let's talk about them on the level of actual implementation, rather than on the level of what the best-of-all-possible-worlds data structure would look like in theory.
  • Sorry for disappearing for a few days (somehow, my RSS reader flaked and stopped indicating new posts to this thread)...

    A few posts back, CloudOfDust pointed to an apparent contradiction between two statements of mine, one of which seemed to advocate for flexibility in relationship-determination while the other explained a need to hard-code certain of those relationships. While the two seem contradictory, I meant them on different levels; the former is about the conceptual (and general) data model for bibliographic information, the latter is about Zotero's particular implementation of it.

    Again, my biggest concern right now is figuring out a core model and interface for Zotero that, while not infinitely flexible, is not incompatible with other implementations of a more flexible model. For example, while we might not implement dozens of possible qualifiers to a relationship (a la Matthias' more exhaustive list above), it's clear that we need a "valence" property associated with any relationship that could cover everything from "reproduced in" to "version of"...

    That said, I think we've covered some good ground here; in the next few days I'll see if I can distill all the issues raised into a new spec, and we can discuss from there (in the hopes of getting something implementable much sooner than later)...
  • That sounds good - as I said above, it sounds like there is general agreement on what the underlying structure should look like, and the biggest remaining questions surround implementation and interface. So I'll look forward to seeing what you'll come up with.
  • It appears to me that Zotero still focuses on the bandwagon of nicely looking and apparently cool web features.

    Del.icio.us and Flickr are also “cool,” indeed. Tags are really “cool” as well. So is my iPod. And so are many of the features that come with the recent developments in internet applications, especially when it concerns digital scholarship.

    Besides the added stability and increased speed of Zotero when it comes to large and serious bibliographies, it feels like the recent upgrade consists of yet a few other “cool” features.

    Emblematic of this is the last post on the forum concerning “annotate and highlight archived pages,” which basically refers to the possibility of adding comments to saved websites.

    Gee, that’s so great – especially given the fact that most serious scholarship takes such “archived pages” very, very seriously.

    Zotero had (or still has?) the capacity to revolutionize the way we (i.e. scholars who deal with the archives on a daily basis) manage our data. The fascinating discussion on hierarchical relationships and parent-child-items was an example of that.

    None of the Thompson products is able to handle such issues.

    Neither Endnote nor Procite can deal seriously with cross-referencing (although Procite claims it can). None of them can define specific relationships between containers and that which they contain; neither of them knows how to deal with parent-items and child-items in a visually transparent manner; etc.

    Zotero had (or still has?) so much prospects to address these fundamental issues with reference to bibliographic databases.

    Thanks for the added stability. Thanks for the increased speed when it comes to large collections. But seriously, were we all waiting to be able to make some notes to some forsaken website?

    In any case, I waited for the recent update but nothing much seems to have happened.

    Yet I did start to import an annotated bibliographic database of 1550 items into Zotero. This goes to show my dislike of the Thompson products and my optimism with reference to Zotero.

    But whatever the developers might claim, importing such a large database into Zotero does involve a good amount of manual labour, even with the supported export files.

    I just wonder how it would be when in a future update we will be able to use parent and child items. Am I going to have to change them manually again?

    S.
  • I concur with Swami (to the surprise of no one reading this thread, I suspect). Though I'm willing to cut the zotero devs a bit of slack here - it's only been a few months since this thread began, and developing a radically new paradigm for handling bibliographic data presumably takes time.

    At the same time, I wonder whether the Zotero developer community tends to have a basic predisposition towards the "cool" web-related features. In other words, maybe for whatever reason most people with the skills, time, and inclination to become Zotero developers happen to be more interested in the "cool" stuff than in the challenges of creating an archive-friendly biblio program.

    I don't want to disparage anyone or the community as a whole, and I know that there are some people (as this thread demonstrates) working on the project who are really interested in this stuff. Plus I expect that many of the improvements that may seem "cool" to me may be vital fixes to core features for someone else - I just wonder how much the makeup of the community supporting an open-source project tends to skew the course of its development towards particular concerns, leading to the neglect of at least parts of the project's original goals.
Sign In or Register to comment.