Improving PDF Quality for the Zotero's new PDF recognizer
When importing the PDF from https://www.oneducation.net/wp-content/uploads/2018/03/10.17899_on_ed.2018.1.1.pdf , Zotero's new PDF recognizer recognises author and title but not year or any other information. What can be done to improve this?
In the absence of canonical metadata, we could probably do a bit better at extracting what's in the PDF, but that's obviously a worse option. The embedded metadata available on the page itself isn't great either.
You can find out the RA using the datacite prefix API:
curl "https://api.datacite.org/prefixes/10.7899"
and this is a datacite DOI, so it has metadata and
curl -LH "Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.17899/on_ed.2018.1.1
looks good. I'll have to take a look what goes wrong. Probably a bug in the search translator.
https://data.datacite.org/application/citeproc+json/10.17899/on_ed.2018.1.1
1) The "type" should not be "report" but "journal article" instead.
2) The title of the journal ended up in the "abstract" field. 3) The date should be month year (03/2018) but it is only 2018.
Also the XML data for the articles of the first issue seem incomplete. Only Merry (2018) shows the publisher information and the CC BY-NC license – the others do not.
The fix would be for a) that DataCite correctly parses the relatedIdentifier "isPartOf" with the ISSN, b) that DataCite comes up with a controlled vocabulary that includes JournalArticle (unlikely in the short term), and c) that "data issued" metadata is used for a more specific publication date.
Martin (DataCite Technical Director)