DataSets
Hi -- I would would need to cite DataSets, how would I enter these in to Zotero so that they are able to be included in the Bibliography?
For example I am working on the following data set: https://catalog.data.gov/dataset/national-assessment-of-oil-and-gas-project-devonian-marcellus-shale-of-the-appalachian-basin-pr
https://certmapper.cr.usgs.gov/data/noga00/prov67/spatial/shape/au6704.zip
Thanks in advance for your guidance
For example I am working on the following data set: https://catalog.data.gov/dataset/national-assessment-of-oil-and-gas-project-devonian-marcellus-shale-of-the-appalachian-basin-pr
https://certmapper.cr.usgs.gov/data/noga00/prov67/spatial/shape/au6704.zip
Thanks in advance for your guidance
type: dataset
This will get picked up by citation styles that are written to support dataset items.
I personally use Document for this type of item and also enter the DOI in Extra:
DOI: 10.1234/67890
Hanmer, Michael J.; Banks, Antoine J., White, Ismail K., 2013, “Replication data for: Experiments to Reduce the Over-reporting of Voting: A Pipeline to the Truth”, Harvard Dataverse, V1, http://dx.doi.org/10.7910/DVN/22893 UNF:5:eJOVAjDU0E0jzSQ2bRCg9g==
SOURCE: Data Citation | The Dataverse Project - Dataverse.org
https://dataverse.org/best-practices/data-citation
If the user has no control over the version number, in the above example 'V1', then the UNF is redundant with versioning as a pointer to a specific version of the data set that cannot be manipulated. For downloads, however, it's an excellent check for data integrity.
Until the item type 'data set' is available, it seems the workaround that *bwiernik* suggested could also be used for the UNF if needed.
UNF are cool, but the idea for including them in citations comes from a very early (2007) article on data citation and things have moved on since then.
To provide just one example of how they are problematic: If I update the codebook on a dataset, including to reverse the direction of a scale that had been erroneously described, the UNF (which is only calculated based on tabular data files) is going to remain the same, suggesting the dataset is unchanged.
Similarly, datasets, including on Harvard Dataverse, increasingly include replication code. Changes in that code aren't reflected in the UNF either.
Unfortunately, there are major differences between data repositories in what changes when a dataset is modified. At the Dataverse, the DOI points to the landing page, and the version number of the dataset changes with every modification, including adding or replacing code files. The user cannot temper with this version number. In contrast, Zenodo allows the same version number for more than one entry in the record with different overall hashes. Also, Zenodo generates a DOI for all versions as well as for each individual versions of a dataset.
I think we will have to learn how important it is to check the history of a dataset for modifications and then check the individual fingerprints of files in the record of a dataset to identify the files that changed.
When the dataset type is added, will it have a generic PID (persistent identifier) field that will accommodate handles, DOIs, URNs, or whatever given repository uses? If so, is there an explicit mechanism we should use now in Extra to have the PID later transfered? Or is the plan to only support DOIs explicitly and other PIDs only via URL?
PID support is, in my opinion, least important for things like URNs and handles that mainly serve as permalinks. Those can just be handled as URL without losing much if anything.
In the biosciences, there's a wealth of PIDs without clear resolvers (though many do resolve via identifiers.org) that may require special handling, but given the numbers here, that requires rethinking the data model a bit more than just "add more identifiers". I think there's a couple of old threads on this if you're interested.
As long as the citation styles supporting dataset types show URLs, it is fine from my perspective.