PubMed translator misinterpreting publication date
By chance, I came upon a bizarre situation, where saving on a PubMed page in Firefox gives a date that Zotero cannot interpret.
For example, on https://pubmed.ncbi.nlm.nih.gov/32672535/
the translator saves Date as "07 16, 2020 d y". Hovering over the date field with the mouse reports 2020-00-07.
Funny enough, I had saved the same item a few months ago, and that copy has proper date data. I assume something has changed in the way Pubmed offers data now.
This appears to be the case only for many newer Pubmed entries (but not all). The translator has no issues with older articles on Pubmed that I tested, and saves them with good date fields.
Here are a couple more odd ones:
https://pubmed.ncbi.nlm.nih.gov/32633716/
https://pubmed.ncbi.nlm.nih.gov/32503885/
For example, on https://pubmed.ncbi.nlm.nih.gov/32672535/
the translator saves Date as "07 16, 2020 d y". Hovering over the date field with the mouse reports 2020-00-07.
Funny enough, I had saved the same item a few months ago, and that copy has proper date data. I assume something has changed in the way Pubmed offers data now.
This appears to be the case only for many newer Pubmed entries (but not all). The translator has no issues with older articles on Pubmed that I tested, and saves them with good date fields.
Here are a couple more odd ones:
https://pubmed.ncbi.nlm.nih.gov/32633716/
https://pubmed.ncbi.nlm.nih.gov/32503885/
Not sure how we'll fix this yet, but we will.
If that's not the case, is there another translator that interprets numeric months properly whose parts can be used to replace the date parsing in the PubMed translator? (I have zero experience with Javascript and looking at that would be useful.)
All or most new entries into PubMed appear to be effected by this issue.
The fix is live for all version, you just might have to ensure that translators are updated.
- The Zotero app (via Add Item by Identifier) should already be fixed.
- Your Zotero Connector should auto-update within 24 hours of when this was fixed (about 17 hours ago), or you can update manually by clicking Update Translators in the Advanced pane of the Zotero Connector preferences.
- If by "online version" you're referring to Add by Identifier in the web library, that's fixed now.
Zotero interprets this as "11 2018 d y", which makes little sense.
Month + Year constructions in PubMed are quite common, so this is not a minor issue. Another common one is Year alone, but that does alright.
I read the conditionals in the new PubMed XML.js, but that did not make much sense to me, which probably means I am guessing incorrectly what ZU.xpathText does. Regardless, Zotero should interpret 11 2018 as a month + year, not day + year.
m y
that I didn't even test. We'll fix that, too. I've reopened the issueMy earlier comment was about PubMed's acceptance of grossly ambiguous date metadata from publishers _and_ then after a few hours/days/weeks (perhaps) editing that ambiguous date format into another ambiguous date format (unless you know that with all-numbers dates the PubMed standard format _seems_ to be mm dd yyyy or yyyy mm dd in contrast to the original publishers' dd mm yyyy).
I was trying to say that part of the problem has nothing whatsoever to do with Zotero or the way that Zotero captures the PubMed metadata. The only _peripherally_ Zotero issue is the timing of when the import occurs -- if imported within a brief time window from the record's PubMed entry date versus imported after a PubMed revision to the format of the publisher-supplied metadata.
This isn't new. 12 or 15 years ago a large group of us indexers and catalogers wrote to the Council of Science Editors to request that it influence publishers to use a standard numerical date format. The folks at CSE replied that idea had been attempted but that there had been vigorous nationalism-based objections to changing from what was obviously the best convention to what might be preferred elsewhere. There also had been disagreements within the CBE/CSE itself (yyyy-mm-dd vs dd-mm-yyyy vs yyyy-dd-mm if I recall correctly).
<pubDate>
<Year>YYYY</Year>
<Month>MM or MMM</Month>
<Day>DD</Day>
</pubDate>
and all problems on import stem from different ways of displaying months (as MM numbers or MMM abbreviations) and are easily and unambiguously fixable.
edit: This same problem exists with all other literature databases. In 2002 or 2003 I regularly fetched xml by ftp from Elsevier and Taylor and Francis Group journals with the month value greater than 12 and the corresponding day value always less than 12. Our parser had a system to flag those errors. I've never understood why the publishers couldn't do that with the original metadata. My parser error check obviously couldn't recognize problem records when the day or month values were within a realistic range. edit2 This almost drove me crazy until (in my system) I discarded everything but the year value.
Thank you!