PubMed translator misinterpreting publication date

By chance, I came upon a bizarre situation, where saving on a PubMed page in Firefox gives a date that Zotero cannot interpret.

For example, on https://pubmed.ncbi.nlm.nih.gov/32672535/

the translator saves Date as "07 16, 2020 d y". Hovering over the date field with the mouse reports 2020-00-07.

Funny enough, I had saved the same item a few months ago, and that copy has proper date data. I assume something has changed in the way Pubmed offers data now.

This appears to be the case only for many newer Pubmed entries (but not all). The translator has no issues with older articles on Pubmed that I tested, and saves them with good date fields.

Here are a couple more odd ones:

https://pubmed.ncbi.nlm.nih.gov/32633716/
https://pubmed.ncbi.nlm.nih.gov/32503885/
  • Thanks -- that looks fixable. The problem is that Zotero expects the month to be presented as an abbreviation (Jul. or July) not a number (07). July 16, 2020, is of course, a perfectly normal date display that Zotero could parse.
    Not sure how we'll fix this yet, but we will.
  • Separate from fixing the translator to handle numeric months, do we want Zotero to parse "07 16 2020" or "07 16, 2020" as dates? The former might be reasonable. The latter is a bit weird. And if we're fixing it to store these differently, do we care? (Obviously without this the dates would be broken for anyone who's saved any of these recent entries in the meantime.)
  • edited November 24, 2020
    I think parsing these would be nice, ideally treating both options just like 07/16/2020. I don't really see a downside
  • Am I getting this right that Dan is proposing to make Zotero interpret all numeric months so any fixes to the translator would become moot?

    If that's not the case, is there another translator that interprets numeric months properly whose parts can be used to replace the date parsing in the PubMed translator? (I have zero experience with Javascript and looking at that would be useful.)

    All or most new entries into PubMed appear to be effected by this issue.
  • Definitely worth fixing in the translator -- I think the current format is fairly unusual, so requires custom parsing. I was hoping to get to it this weekend, but absolutely no promises.
  • I understand. I was wondering if I should attempt, but I probably shouldn't. Thank you very much.
  • edited December 13, 2020
    The problem I see is that when metadata is first received at PubMed it isn't always a clear interpretation of month versus day of month. I see (for example): 2020 02 07, or 07 02 2020, when _both _ are the same date , July second 2020 (but could this not also be February seventh?) Some way this ends up disambiguated in the PubMed record after a few days or weeks. So, not only are some dates ambiguous when the record first appears but the same record's date will get modified. The questions are, "How do we know the proper order of day/month, and how do we know if PubMed has yet standardized the date at the moment the record has been imported into Zotero? In another thread I said that I discard everything but the publication _year_ . This is one of the reasons why. How many citation styles require the month and day of journal article publication?
  • @DWL-SDCA: I'm not sure what you're referring to, but we don't use scraping on PubMed. The date parts are clear in the XML data we use. This was just a case of their starting to use numeric months for some entries.
  • @enozkan -- that's now fixed. All full dates on Pubmed are imported as ISO (YYYY-MM-DD). Thanks for reporting.
  • When might this fix be present in the mac desktop version or the online version?
  • Same answer as here: https://forums.zotero.org/discussion/comment/370668/#Comment_370668
    The fix is live for all version, you just might have to ensure that translators are updated.
  • edited December 15, 2020
    @AlyceChen:

    - The Zotero app (via Add Item by Identifier) should already be fixed.

    - Your Zotero Connector should auto-update within 24 hours of when this was fixed (about 17 hours ago), or you can update manually by clicking Update Translators in the Advanced pane of the Zotero Connector preferences.

    - If by "online version" you're referring to Add by Identifier in the web library, that's fixed now.
  • @adamsmith Thank you. I did come about one construction, which is still problematic. Check this entry, which has a date November 2018, probably served as "11 2018": https://pubmed.ncbi.nlm.nih.gov/30133126/

    Zotero interprets this as "11 2018 d y", which makes little sense.

    Month + Year constructions in PubMed are quite common, so this is not a minor issue. Another common one is Year alone, but that does alright.

    I read the conditionals in the new PubMed XML.js, but that did not make much sense to me, which probably means I am guessing incorrectly what ZU.xpathText does. Regardless, Zotero should interpret 11 2018 as a month + year, not day + year.
  • oh, I was so sure Zotero would read that as m y that I didn't even test. We'll fix that, too. I've reopened the issue
  • edited December 15, 2020
    As I failed to make clear above there are problems that are out of Zotero's control.

    My earlier comment was about PubMed's acceptance of grossly ambiguous date metadata from publishers _and_ then after a few hours/days/weeks (perhaps) editing that ambiguous date format into another ambiguous date format (unless you know that with all-numbers dates the PubMed standard format _seems_ to be mm dd yyyy or yyyy mm dd in contrast to the original publishers' dd mm yyyy).

    I was trying to say that part of the problem has nothing whatsoever to do with Zotero or the way that Zotero captures the PubMed metadata. The only _peripherally_ Zotero issue is the timing of when the import occurs -- if imported within a brief time window from the record's PubMed entry date versus imported after a PubMed revision to the format of the publisher-supplied metadata.

    This isn't new. 12 or 15 years ago a large group of us indexers and catalogers wrote to the Council of Science Editors to request that it influence publishers to use a standard numerical date format. The folks at CSE replied that idea had been attempted but that there had been vigorous nationalism-based objections to changing from what was obviously the best convention to what might be preferred elsewhere. There also had been disagreements within the CBE/CSE itself (yyyy-mm-dd vs dd-mm-yyyy vs yyyy-dd-mm if I recall correctly).
  • None of the issues we're solving in this thread are due to an ambiguous date format in Pubmed. While these may exist (I take your word for it) every item reported here has the date in the Pubmed XML as
    <pubDate>
    <Year>YYYY</Year>
    <Month>MM or MMM</Month>
    <Day>DD</Day>
    </pubDate>


    and all problems on import stem from different ways of displaying months (as MM numbers or MMM abbreviations) and are easily and unambiguously fixable.
  • edited December 15, 2020
    @adamsmith All I'm trying to say is that in the PubMed XML the values for MM and DD can become inverted after a few days from the record creation date when the PubMed record is edited / revised. The publishers' metadata as provided to PubMed is formatted incorrectly. PubMed initially accepts the badly labeled pubDate into the database record and may later revise the record to correct the date.

    edit: This same problem exists with all other literature databases. In 2002 or 2003 I regularly fetched xml by ftp from Elsevier and Taylor and Francis Group journals with the month value greater than 12 and the corresponding day value always less than 12. Our parser had a system to flag those errors. I've never understood why the publishers couldn't do that with the original metadata. My parser error check obviously couldn't recognize problem records when the day or month values were within a realistic range. edit2 This almost drove me crazy until (in my system) I discarded everything but the year value.
  • @enozkan -- OK, month-year entries are now fixed, too.
  • I am now experiencing a related, but slightly different issue. An article has a correction, but the import by PMID is reflecting the date for the correction (which has a completely different PMID). I am importing PMID: 31161239 (original paper) and getting the date for PMID: 31432207 (the correction) in the metadata.
  • Cool. You concatenated them with a "/", converted to ISO. Good to know.

    Thank you!
  • @AlyceChen -- I don't know if that's generally the case for corrections on Pubmed, but it gives April 20 as the date in all metadata: in the metatags on the page, in the .nbib, and in the Pubmed XML. You can even see it in the citation they provide. Not much we can do there.
Sign In or Register to comment.