DataSets

Hi -- I would would need to cite DataSets, how would I enter these in to Zotero so that they are able to be included in the Bibliography?

For example I am working on the following data set: https://catalog.data.gov/dataset/national-assessment-of-oil-and-gas-project-devonian-marcellus-shale-of-the-appalachian-basin-pr

https://certmapper.cr.usgs.gov/data/noga00/prov67/spatial/shape/au6704.zip

Thanks in advance for your guidance
  • Zotero will add a Dataset item type in a future version. For now, enter them as another type and enter this at the top of Extra:
    type: dataset

    This will get picked up by citation styles that are written to support dataset items.

    I personally use Document for this type of item and also enter the DOI in Extra:
    DOI: 10.1234/67890
  • @bwiernik thank you for the feedback, could you let me know if Chicago's Citation Style supports it? I am not sure how to check or how Zotero's styles are written.
  • edited May 5, 2019
    The Chicago Manual doesn't really prescribe how dataset citations should look (they only have databases in there). The citation does come out reasonable for datasets.
  • ok great. So your steps above would work.
  • I noticed that the Dataverse project suggests including a universal numerical fingerprint (UNF):

    Hanmer, Michael J.; Banks, Antoine J., White, Ismail K., 2013, “Replication data for: Experiments to Reduce the Over-reporting of Voting: A Pipeline to the Truth”, Harvard Dataverse, V1, http://dx.doi.org/10.7910/DVN/22893 UNF:5:eJOVAjDU0E0jzSQ2bRCg9g==

    SOURCE: Data Citation | The Dataverse Project - Dataverse.org
    https://dataverse.org/best-practices/data-citation

    If the user has no control over the version number, in the above example 'V1', then the UNF is redundant with versioning as a pointer to a specific version of the data set that cannot be manipulated. For downloads, however, it's an excellent check for data integrity.

    Until the item type 'data set' is available, it seems the workaround that *bwiernik* suggested could also be used for the UNF if needed.
  • Yeah, I disagree with Dataverse on this and no one else recommends hashes of any kind for data citations (they should of course be part of the metadata)-- we're not going to add them in Zotero or CSL.

    UNF are cool, but the idea for including them in citations comes from a very early (2007) article on data citation and things have moved on since then.

    To provide just one example of how they are problematic: If I update the codebook on a dataset, including to reverse the direction of a scale that had been erroneously described, the UNF (which is only calculated based on tabular data files) is going to remain the same, suggesting the dataset is unchanged.
    Similarly, datasets, including on Harvard Dataverse, increasingly include replication code. Changes in that code aren't reflected in the UNF either.
  • It was not transparent to me that UNF fingerprinting is for the data only. So, I agree with you that it is not useful as proof of integrity if a dataset includes other file types.

    Unfortunately, there are major differences between data repositories in what changes when a dataset is modified. At the Dataverse, the DOI points to the landing page, and the version number of the dataset changes with every modification, including adding or replacing code files. The user cannot temper with this version number. In contrast, Zenodo allows the same version number for more than one entry in the record with different overall hashes. Also, Zenodo generates a DOI for all versions as well as for each individual versions of a dataset.

    I think we will have to learn how important it is to check the history of a dataset for modifications and then check the individual fingerprints of files in the record of a dataset to identify the files that changed.
  • Dataverse will implement new DOIs for version numbers along the lines of Zenodo, which is also the recommended practice by Force11, RDA, etc.
  • What is the recommendation for Handles? Fill them in the full URL form as URL, use the plain form without the "http://hdl.handle.net/" as Extra:DOI, or both? Since DOIs are handles, the doi.org resolver is actually a handle resolver and it works with our handles without any problems.
  • I'd just use the resolved handle as a URL. Definitely don't add it as a DOI, because the expectation there is that there's also going to be metadata
  • OK, thanks.

    When the dataset type is added, will it have a generic PID (persistent identifier) field that will accommodate handles, DOIs, URNs, or whatever given repository uses? If so, is there an explicit mechanism we should use now in Extra to have the PID later transfered? Or is the plan to only support DOIs explicitly and other PIDs only via URL?
  • There are no short-term plans for persistent identifiers other than the currently supported ones (DOIs, ISBN, ISSN, PM(C)ID and kind of arXiv ID).

    PID support is, in my opinion, least important for things like URNs and handles that mainly serve as permalinks. Those can just be handled as URL without losing much if anything.

    In the biosciences, there's a wealth of PIDs without clear resolvers (though many do resolve via identifiers.org) that may require special handling, but given the numbers here, that requires rethinking the data model a bit more than just "add more identifiers". I think there's a couple of old threads on this if you're interested.
  • edited February 9, 2021
    I agree that URLs work well for Handles. However practically speaking there is a problem with citation styles, many of which do not show URLs, but do show DOIs. Thus we often end up copying Handles into the DOI field.

    As long as the citation styles supporting dataset types show URLs, it is fine from my perspective.
  • If you have specific citation styles that don't show URLs for dataset item types (i.e. input with type: dataset in Extra currently), just let us know. Dataset support in citation styles is currently very limited, but happy to fix when you notice.
Sign In or Register to comment.