Zotero Connector with a chapter in a Book series with a series volume number

hughp3 · September 8, 2020

Greetings,

I'm wondering if there is metadata tag or series of tags which Zotero (web connector) picks up on for the series a book is in and the volume of that series. I'm trying to make the following readable, without invoking the Crossref or datacite APIs. Note that when those APIs are used with this DOI (likely the publisher's responsibility) the series volume is imported as the book volume.

Paterson, Rebecca. 2015. Narrative uses of the U̱t-Maꞌin (Kainji) Bare Verb form. In Doris L. Payne & Shahar Shirtz (eds.), Beyond aspect: the expression of discourse functions in African languages (Typological Studies in Language 109), 219–248. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/tsl.109.08pat. https://benjamins.com/catalog/tsl.109.08pat (30 June, 2020).

adamsmith · September 9, 2020

I'd have to dig through the code, but almost certainly not, no. If you need that level of precision, you'll want to write a custom translator or us unAPI.

(As I think I mentioned before, some context -- i.e. why do you need to know this -- is helpful for these types of questions; both to get you better answer and, frankly, for me (and anyone else) to be able to judge whether it'd be worth spending some extra time to look through options in the code or not.)

hughp3 · September 9, 2020

Context:: Hugo is a static site generator. I am modifying a Hugo theme with the intent of re-release, or merging into the theme with propper review. The theme is https://wowchemy.com/, it targes academic CVs and academic labs. I am writing code to enable it to generate the metadata necessary so that viewers of websites which use this can properly import the citations from people's CVs via Zotero, and list the content in GoolgeScholar. There are over a 1000 people using the theme currently, and it seems to be gaining momentum. I am also writing a Zotero review in the context of Linguistics and Language documentation. Part of that review of Zotero is a section about the Linguistic publishing field and how it is (or isn't) making metadata available in useful ways. I mean, no sense in saying the tool can't do something, if in fact the tool can do it, but it is just the publishers who are not using good practice in their metadata dissemination strategies. I can move my questions over to Zotero-Dev if that is a better place, but since I am not actually dev'ing on Zotero I haven't put any chatter in that channel.

adamsmith · September 9, 2020

My view (and a fairly common view among the academic web community I think) is that all existing standard metatags are quite limited and probably on their way out in the medium term as the default, so things like series information or so not importing properly is really no ones "fault". There just isn't an existing standard that publishers or Zotero could follow.
In some cases, Zotero makes educated cases about quasi-custom metatags (i.e. things that aren't in any standard but are used by some sites), but that's really not a sustainable solution.

I'd say for a static site generator, the way to go is to implement basic DC and highwire metatags and focus on implementing JSON-LD mainly following schema.org -- I'd hope Zotero will catch up on this shortly (there are several open&active tickets).

hughp3 · September 9, 2020

Yes. I am in the process of implementing highwire, dc, dcterms. eprints has some interesting tags too.

You might see me on the schema mailing list as I am on that semi-regular too.

adamsmith · September 9, 2020

In case you weren't aware, you can also look at https://github.com/zotero/translators/blob/master/RDF.js and https://github.com/zotero/translators/blob/master/Embedded Metadata.js
which is the Zotero code that handles embedded metadata.

We're always open to add/tweak things there, but generally want some sort of visible documentation or visibly established practice (needn't be official) that we can base those changes on.

hughp3 · September 11, 2020

Those look promising. I did not know about them.

Can you help me understand the linked lines?
https://github.com/zotero/translators/blob/cc7b0538d179c1c79e86392ca53f0769c91d8a28/RDF.js#L1336

My understanding is that they should import dc.subject terms in meta tags as zotero keywords. This is failing for me. (I have the subject terms in the meta tags). It looks like this: https://gist.github.com/HughP/45df574e7ebc417f48d4b763839ae2de

adamsmith · September 11, 2020

There's too much garbage in dc.subject in the wild, so we're discarding it for EM:
https://github.com/zotero/translators/blob/master/Embedded Metadata.js#L831

hughp3 · September 12, 2020

So, I press on (from the keywords/tags issue). The above gist's metadata triggers the zotero connector to recognize the item (which is cited in the OP) as a book. So it seems to me that it is not detecting that there is a container type relationship. I've been looking around the internet for pages that zotero does recognize as a book section. Here are a few that I have found:

https://www.sciencedirect.com/science/article/pii/B9780123603807500119
https://link.springer.com/chapter/10.1007/1-4020-3060-6_3#citeas
https://www.cambridge.org/core/books/language-typology-and-syntactic-description/typological-distinctions-in-wordformation/48FEE1DAECCEBD425041002CE15E4CE6

These three publishing platforms use a tag which I have not seen before, which is

<meta name="citation_inbook_title" content="Some title" />

However, when I add this element to my code, Zotero Connector doesn't change its' type recognition. So it seems that this tag isn't solely responsible for the impact upon Zotero Connector's recognition of an item as a book section. Conversely, when I do add:
<meta name="DC.type" content="Book chapter" /> Zotero Connector changes immediately.
However, I am loathsome to add the DC.type value 'Book chapter' as that is not a valid DCMIType value. (It is bad enough that there are no <link rel=....> values for highwire tags in HTML5 and that Google Scholar has created non-standard meta values in the DC. namespaced tags.)

QUESTION: does the current algorithm design in the files you linked to for detection of a book part (like a chapter) use only brute force NLP methods to search for textual values like "bookSection" or "book chapter", rather than processing complete tags or patterns of tags? By patterns of tags I mean the presence of an ISBN number with a page range value. In the moment I can't think of another use for this pattern to indicate indicate a resource other than a portion of a book which is referenced. I imagine that pages could be indicated by:

 <meta name="citation_firstpage" content="25">
<meta name="citation_lastpage" content="50">

or the custom DC. tags that GoogleScholar advocates or the eprints tag
<meta name="eprints.pagerange" ...>.

PS. I'm sorry I do, do some algorithm design work, but I am "JavaScript challenged". I find the notes the most helpful part of the documents linked, and I do read the values for "keywords".

adamsmith · September 12, 2020

The three examples you link to all have custom translators. None of them are recognized as chapters using the Embedded Metadata translator (which you can run using right-click on the Save to Zotero icon).

We don't use any NLP or pattern matching in RDF/EM: the translators match only on specific tags and only exact matches (minus capitalization and in some cases spaces)

We absolutely use tag patterns do determine item types and fields when it makes sense, yes -- not sure there's an example in the EM translator, but there's definitely in others. The other use case for ISBN and page range would be a conference paper in a proceeding, but that doesn't mean we couldn't do this.

hughp3 · September 14, 2020

The reason I asked about NLP is because I was thinking that maybe the Zotero Connector was pulling from the schema.org JSON, but then I realized that the same information was in the og:description and the html5 'description' meta tag.

So, my understanding of what I am seeing is that, I have applied an explicit eprints abstract tag, an explicit dcterms.abstract meta tag and an explicit DC.Description meta tag. It seems the algorithm in the Zotero Connector prefers the contents of the html5 'description' to the explicit abstract tags. I want to control the taglines that search engines produce with one description and provide Zotero users with another, proper abstract. How is a web developer to best approach this?

If you want to see the content I am testing live:
https://hughandbecky.us/bcv-test/publication/2015-narrative-uses-of-utmatin-bare-verb/

or if you want to see the same html code but tidy (and incase my dev site evolves) https://gist.github.com/HughP/799c19e33e7b267829a0af75838c6514

adamsmith · September 14, 2020

I think preferring description and og:description over dc.description and certainly eprints.abstract is a bug. citation_abstract should work best.