Zenodo translator turns software into report

katrinleinweber · February 12, 2018

Hello!

When I browse to https://zenodo.org/record/580337 my browser switches the Zotero icon to software/code, but upon importing, the item becomes a report.

Despite Software, while neither "article", nor "preprint", nor "report" appear in the source of that page, which might effect this behaviour, according to https://github.com/zotero/translators/blob/master/Zenodo.js.

Zotero 5.0.34.5
Firefox 52.6.0
Chrome 63.0.3239.132

Is this happening to other people as well? What could be the reason? Thanks for any hints!

adamsmith · February 12, 2018

Zenodo puts "article" into the CSL JSON, which we use for import. That's not a mistake on their part (I suggested this to them), but due to the fact that CSL doesn't have a software item type (yet).

I think the easiest way to fix this would be to re-set the item type using the detectWeb function, i.e. something like item.itemType = detectWeb (doc, url) (untested)

katrinleinweber · February 13, 2018

Thanks for explaining and the suggestion :-)

Do you mean that the time is ripe for a PR to change this behaviour? Or that I should apply it to my fork? I would be interested in the latter, but would rather inquire about the reasons for the original suggestion?

I have incomplete understanding of this situation, but I can't help thinking that maybe stopping to use work-arounds (even while downstream support (CSL) is lacking!) would help to resolve such chicken-or-egg-type conundrums?

adamsmith · February 13, 2018

I mean a PR. There's absolutely no reason for this other than that either I overlooked it when writing the translator or that there was something different about Zenodo's CSL JSON back then.

Not quite sure I understand the workaround part -- are you just saying we should fix this in the Zotero translator (agreed) or something else?

katrinleinweber · February 13, 2018

"work-around" in that Zenodo applied it back then to avoid any risk of issues with the CSL intepretation of the items downstream.

I'll try to update the translator. Maybe only in March, because the first free slot in my schedule occurs then.

adamsmith · February 13, 2018

they certainly shouldn't be inventing CSL item types -- that could cause all sorts of problems; I don't think staying within a given standard is a workaround. They also use misc as a bibtex item type for the same reason.

katrinleinweber · February 13, 2018

The "misc" vs. "software" as BibTeX types is a good example, about which might learn more at https://www.wias-berlin.de/WCMS/program.jsp?MMSDays2018. Did using the "software" item type cause any problems in your tests? I added some "software" items to a .bib, let pandoc build a document from it and found no issues. Except the silent downgrade to "misc".

adamsmith · February 13, 2018

I don't know about software in particular, but the fact that everyone just keeps inventing things in bibtex and there's no stable standard has been a huge nightmare. See e.g. the mess that's citing a webpage: https://tex.stackexchange.com/questions/3587/how-can-i-use-bibtex-to-cite-a-web-page
Having dealt with having to make compromises around "most common" practices based in BibTeX for many years now, I'll fight tooth and nail to keep CSL and its input standardized to avoid that.

katrinleinweber · March 1, 2018

OK, granted, but why switch the software/ComputerProgram to report or misc alredy when importing into Zotero, which can support that item type? Shouldn't the downstream issues be left to downstream?

I've asked the Zenodo colleagues in https://github.com/zenodo/zenodo/issues/1428 about this as well.

adamsmith · March 1, 2018

I agree that we should import that correctly into Zotero, as I say above.

What I disagree with is the idea to do that by taking a standardized metadata format (and while CSL JSON isn't formally specified, it follows the CSL specifications which are) and basically inventing new types in it.

I think that's bad practice with potentially very bad outcomes. For example, we haven't made the final call on how the software item type will be called in CSL. Say Zenodo now goes ahead and calls this software and then we end up calling it computer_program -- we then have a massive amount of incorrect metadata out in the wild that we have to deal with. That's how standards fall apart. I know it can be frustrating that standards take time to adjust to new requirements. They're still preferable to metadata anarchy.

Both bibtex and CSL is used by many different tools and styles. The expectation that they all gracefully handle unexpected input isn't warranted -- we don't require it for CSL and it's definitely not the case for bibtex.

I care somewhat less about what ends up happening with bibtex because it's already broken, but please don't suggest that people start customizing CSL JSON.

katrinleinweber · March 1, 2018

No, I'm not suggesting that for CSL. Thanks for explaining the reasoning :-)

Regarding Bib(La)TeX & biber, my impression until now (own tests and asking around) was that they _do_ gracefully fall back to `@misc`. This "definitely not the case" is the first time I hear of potential problems.

katrinleinweber · March 7, 2018

Attempting a fix as suggested :-)
https://github.com/zotero/translators/pull/1578