Dataset - genre

Hi all,

which genre should I use to cite a dataset in Chicago author-date?

Thanks for your help, regards: Krisztina

  • OK, thanks, we will test it.
  • edited August 11, 2019
    I was looking for a place to voice an opinion about this, and this thread seems to be the most recent relevant place.

    I recently had to enter reference data for an Excel spreadsheet maintained by the U.N. As the page Adam references suggests, I used "Document" as the Item Type. Here's the reference Zotero generated:

    United Nations Department of Economic and Social Affairs, Population Division. 2019. “World Population Prospects 2019, Online Edition.” Excel spreadsheet. United Nations.

    If you follow the link in the reference, you'll actually come to a page devoted to mortality data. Building on the suggestion on the page Adam referenced, here's what I actually put in the Extra field:

    type: dataset
    medium: Excel spreadsheet
    file: MORT/7-1: Life expectancy at birth (both sexes combined) by region, subregion and country, 1950-2100 (years)

    In this case the file is one of several Excel spreadsheets listed on a page with links and that itself is part of a much larger web site providing data on various subjects in various formats. I felt it was more important for the reference's URL to link to the page (which explains the file) than to the specific Excel file, which would just open or download upon clicking the link. Notice that the file name, which identifies the specific spreadsheet, did not make its way into the reference.

    My comment is that these days a "dataset" item type either has to be remarkably flexible or may have to be just one of several new item types. The CSL "dataset" type appears to be the only choice currently in CSL (see It may already be obsolete.

    IIRC, the term "dataset" originated with IBM mainframes & OS 360, which called every data file a "dataset." The term "database" was sometimes used to denote a collection of related "datasets." But "database" soon came to denote a source to which a user could issue queries to retrieve specific information interactively or a programmer could write query scripts to retrieve reports, subsets of data, etc. Today people use such terms loosely, often interchangeably, and they can mean just about any kind of data source.

    As my example demonstrates, this notion of a dataset is too generic. In the example, the Excel spreadsheet's relation to the entire database is analogous to a journal article's relation to a journal. Given the wide variety of "dataset" and "database" types, more specific information is needed.

    For example, the project that took me to this Excel spreadsheet also took me to a site in which the "dataset" consisted of four files: data as an ASCII text file, a pdf file describing the data, a data dictionary (which I haven't looked at but is either text, fixed-format ASCII text, or R code), and a file of R code used to process the data.

    The vague "dataset" item type does not seem to allow such details. But just as including the language of a journal article in a reference helps readers decide if retrieving the article is worthwhile, so too is it helpful to know if a "dataset" is part of a larger collection, requires auxiliary files to be useful, requires specialized software and knowledge, etc.

    Also, today it's possible to use a programming language like R to scrape information or retrieve information from web sites. Since such programs may contain errors, documenting such retrieval methods may become important.

    One or more new item types should anticipate the need for such information in references designating data sources.
Sign In or Register to comment.