New citation type: Research data/dataset

philippconzett · December 21, 2016

Are there any plans to include research data/datasets in the list of citation types? For more info on how to cite research data, cf. http://best-practices.dataverse.org/data-citation/.

Best,
Philipp

Kompostkvarn · December 21, 2016

I also wonder about this! Would be great as I use a lot of statistics databases.

adamsmith · December 21, 2016

Field and item type updates are planned for version 5.1. Dataset will definitely be included. If you want to, you can use an item type like journal article (because it has DOIs) and "force" it into being a dataset for citation purposes by including "itemType: dataset" (without the quotation marks) in the "Extra" field. This will migrate automatically to a proper item type once that's introduced. Note that very few citation styles have specified formats for datasets.

dschlaep · March 21, 2017

I have recently switched from EndNote to Zotero. I really miss a dataset type in Zotero. I wish v5.1 will come out soon! Anyhow, thanks for great work on Zotero!

bwiernik · March 21, 2017

As adamsmith mentions, you can currently get the functionality of a "dataset" type by adding "itemType: dataset" (without the quotation marks) to the top of the Extra field. Journal article is probably the type to use for this, as it has a DOI field (but you can also add a DOI to other types by adding DOI: 10.1234/45678 to the Extra field as well).

paschalia.terzi · August 23, 2017

So there's no dataset type for the new Zotero 5. Why? As a librarian I think that was very important

bwiernik · August 23, 2017

Updates to item types and fields are coming in Zotero 5.1. Before those changes can be made, syncing needs to be turned off to Zotero 4.0 clients.

paschalia.terzi · August 23, 2017

when will that be available? I am preparing workshops for my students that's why I am asking.

LiborA · August 23, 2017

How bwiernik wrote:

Before those changes can be made, syncing needs to be turned off to Zotero 4.0 clients.

There is no eta now, but because a lot of people use still Zotero 4.0 clients, it will be not in a short time.

bwiernik · August 23, 2017

@paschalia.terzi For your workshops, what you should do is advise your students to use Journal Article and add type: dataset to the Extra field. When the dataset type is added, these items will likely be automatically migrated. I've taught quite a few Zotero workshops and given this advice; students don't generally find it very difficult.

desmedt · March 26, 2019

I support Philipp's suggestion to consider good practice when citing datasets. A persistent identifier (Handle or DOI) pointing to full metadata is definitely recommended. Adding information about provenance and distribution in the textual citation would be very helpful.

philippconzett · May 10, 2019

Together with @desmedt I'm right now co-authoring a chapter on data citation, where we'd like to include a section on reference managers. Is there any chance for "dataset" to be added to the list of citation types in the near future? It would be nice to refer to this in our chapter. Thanks!

adamsmith · May 10, 2019

The status is the same as above. It's possible to enter datasets into Zotero as above and they will produce correctly formatted citations in styles that support datasets (such as APA), but a visible item type will have to wait until Zotero 5.1 (or 6.0, whatever the next major version will be called), with an unknown ETA. I believe the intention is for this year, but past experience has been that initial estimates may be widely off.

Datasets entered as described above will automatically migrate to the proper item type once it exists.

philippconzett · May 12, 2019

Thanks for this heads-up!

Could you please clarify some issues?

1. Should we add "itemType: dataset" (as suggested above), or "type: dataset" (as suggested here: https://www.zotero.org/support/dev/translators/datasets)? I have tested both. They render the same result, but will bot be automatically migrated to the proper item type once it exits?

2. Should we use the Item Type Journal Article (as suggested above), or Document (as suggested here: https://www.zotero.org/support/dev/translators/datasets)? I have tested both. They render the same result, but will bot be automatically migrated to the proper item type once it exits?

3. None of the possible combinations of any of the solutions above (1, 2) result in citations in line with best practice recommendations (see e.g. Joint Declaration of Data Citation Principles, DataCite), somewhat depending on the citation style used. For instance, reference created with the Chicago Manual of Style 17th. ed. are missing information about version, resource type (dataset), and fixity (e.g. UNF). APA (6th. ed.) is missing information about fixity (e.g. UNF). I have tested this with this dataset: https://doi.org/10.18710/QAJKZW. Is there a way to get these elements included other than adding them manually in the manuscript?

4. What approach should we choose when needing to add references to parts of datasets? For instance, how should a Zotero reference record look like for this dataset file: https://doi.org/10.18710/QAJKZW/M1I7AP?

bwiernik · May 12, 2019

1) type: dataset
2) Document is generally better. It doesn’t really matter because the `type` value in Extra overrides the chosen type. Document is chosen for the user experience—Document is rarely used and has a different icon. Both will eventually migrate to a proper dataset item because of the `type` value.
3) To add those other pieces of information, add additional fields to Extra, such as `version` or `medium` (for resource type). I don’t know what “fixity” means (publisher?). To be frank with these, though, I’d say that none of them are particularly important so long as you include the DOI (to the specific version if needed).
4) Treat this the same as an item for the whole dataset, being sure to include the DOI for the specific part.

adamsmith · May 12, 2019

As for 3), I worked on the Force 11 Data Citation Implementation Pilot, and fixity shouldn't necessarily be part of the citation (and commonly isn't; Dataverse is the only exemption and that has mostly historical grounds because of the 2007 Altman and King piece). Principle 7 refers to fixity being available in the citation _or_ the metadata. UNF is only available for tabular quantitative data. Other fixity checks are at the file level and don't apply to the whole dataset, so the metadata is really the only viable option.

4) No one really knows how to cite parts of datasets as a matter of style, but the RDA data citation WG indeed suggests just treating them as datasets with unique DOIs.

philippconzett · May 13, 2019

3) I have added version in the Extra field, but it doesn't show up in certain citation styles. If available, data fixity information is useful to have as part of the reference. Imagine the situation where you have the data and reference to some data, but the metadata record is for some reason not available anymore. Than fixity information indicates whether your data is the same as mentioned in the references.

4) One of our principles is not to treat data citation unnecessarily differently from citation of scholarly publication. To cite a part of dataset would then correspond to citing a part of monography, e.g. This means that the whole dataset should be mentioned in the reference in a correct way including PID, but in my attempts with Zotero as described above, it doesn't. I have tried several citation item types, e.g. part of book.

bwiernik · May 13, 2019

4) if you want to mention that it is part of a larger dataset, add `Container title: Larger dataset` to Extra.

As you are encountering, most citation styles are not written to expect any field to potentially be present for any item type, so variables like version or container title won’t necessarily show up for all styles (APA might be the only one where I would expect all of these to work). Supporting additional variables for data citations in a style that doesn’t show them would require editing the CSL style.

adamsmith · May 13, 2019

3 a) We'll have to agree to disagree on checksums. I don't see those happening in either Zotero or CSL.
3 b) Part of the issue is what to do with citation styles that don't prescribe a dataset format (i.e. most of them). Chicago, e.g., has a terrible section on "citing data from a scientific database" that shows no awareness whatsoever about the conversation about data citations. At the same time, I'm not sure how comfortable I am to just make something up. I'd be open for adding version to Chicago dataset citations, though. That seems reasonable (though ideally not necessary; the version should really be captured by the PID, though admittedly that's not the case in many data repositories, including the Dataverse family).

danielborek · May 20, 2022

Is this implemented now? I dont see any "dataset" type and it would be really useful

adamsmith · May 20, 2022

Nothing new. dataset is in CSL and can be used using type: dataset in Extra in Zotero. Zotero has started with updates of item types/fields, so we're hopefully pretty close.

hanna-brooks · February 10, 2023

Question on this, is there a way to add the version as an extra field note too? Thanks!

adamsmith · February 10, 2023

Yes:
Version: 3.1
See
https://www.zotero.org/support/dev/translators/datasets
for details. I'm giving a workshop on adding dataset metadata to Zotero in April and they've promised me to get the actual item type live until then, so we're hopefully quite close.

philippconzett · April 22, 2023

Thanks, @adamsmith. In another channel, you mention that Zotero now has full dataset support. Could you point to a page with an announcement / more information about this? Will the workshop you mention in your reply above be online and open for everyone? Thanks!

bwiernik · April 23, 2023

Came in 6.0.24 https://www.zotero.org/support/changelog

swifters · July 4, 2023

Thanks for introducing this type. I'm wondering if there's a more comprehensive field mapping table available. The table on Zotero Dev page is incomplete (e.g. no identifier, repo. location or format) and the authoritative Zotero-CSL table (https://aurimasv.github.io/z2csl/typeMap.xml ) hasn't included "dataset" yet. Thanks for your help.

adamsmith · July 4, 2023

I'll try to get the CSL table updated but for now:

Identifier - number
repo - publisher
repo location - publisher-place
format - medium