Zotero Connector and itemType manuscript

hughp3 · September 8, 2020

Greetings,

Is there a way to force Zotero to select the itemType 'manuscript' when scraping with the web connector? Is there a meta tag or combination of tags which will trigger the manuscript type?

I asked a similar question about presentations, and that answer was most helpful. https://forums.zotero.org/discussion/84772/zotero-connecter-with-itemtype-presentation#latest

hughp3 · September 12, 2020

@adamsmith

We have been discussing similar issues in other threads but this thread was already started about the itemType 'Manuscript'. (such as this thread: https://forums.zotero.org/discussion/85059/zotero-connector-with-a-chapter-in-a-book-series-with-a-series-volume-number#latest)

Can you help me understand why items from a pre-print server are imported to Zotero via Zotero Connecter as journal articles insead of manuscripts? (For instance, given the definition in the Zotero Documentation for manuscript. https://www.zotero.org/support/kb/item_types_and_fields. It seems to me that these manuscripts are circulated but not published.)

One random example for demonstration from arXiv: https://arxiv.org/abs/1501.00356v1

I'm trying to force the manuscript type in the Zotero Connector and it seems (in my very poor ability to read JavaScript) that the default assumption in the assignment algorithm for itemType is journal article. I'm wondering if it should be manuscript, instead. Or alternatively if there is a way to force manuscript. DC.terms doesn't seem to have something relevant, and with the evaluation of highwire, DC, and eprints, the only explicit tag (rather than a lack of explicit description) which seems to fit the concept of "unpublished material" is <meta name="eprints.ispublished" content="pub" /> where there is a 4 way distinction on the content value (according to this paper: https://www.researchgate.net/publication/274076522_Scheme_for_mapping_scientific_research_data_from_EPrints_to_CERIF_format. (pub, inpress, submitted, unbpub). Of course finding official eprints documentation remains a challenge, but the linked item: https://eprints.soton.ac.uk/267320/ does have the published value of pub in the quoted tag (see line 24).

bwiernik · September 12, 2020

The most correct item type for papers on a preprint server would be Document (CSL ‘article’). These aren’t manuscripts, but rather published through the preprint process, and these typically have different citation needs than manuscripts.

hughp3 · September 13, 2020

@bwiernik

If I understand you, (please correct me if I don't articulate your position well); You hold that pre-print is really a printing process, when something leaves a pre-print server, or if a digital manifestation leaves a digital repository it is "published". You likely would also hold that public availability **is** publishing. This is in contrast to holding the definition that "published" means that there is a formal formatting process, usually with an editor, and an agent (person/organization) acting as publisher.

I'm trying to wrap my head around the concept of "document". In one sense everything is a document: Video, audio, print media, articles, books, etc. (MS Windows has a folder called 'My Documents'). The physical copy of the US Constitution or the Declaration of Independence — Are documents.

My understanding of the use of "document" in Zotero. Is that the itemType is a generic un-specified 'by other criteria' or underspecified by other criteria. It is a place holder term for 'thing'. Some of the things I have put into Zotero as 'documents' include: Intellectual property policies by universities (these are not really standards nor are they really reports), long abstracts (maybe these should be supporting material to a cited presentation), technical proposals to Unicode (and other standards), one page visual layouts for keyboards and in various languages.

Maybe you can help me see how you see the difference between a 'document' and a 'manuscript'. From what I perceive in what you present, you are suggesting that if I write something on my laptop. Print it out and give it to a few colleagues then that would be a 'manuscript', in your view. If I took that same content bundled it as a PDF and pushed it to a pre-print server then that would then become a 'document'?

I know there are multible approaches to categorization, so please don't take me as trying to be divisive. I'm trying to better understand what you mean and the relevance of your comment.

bwiernik · September 13, 2020

Please don’t overthink this. Preprints are different from an unpublished manuscript—they are posted, publicly available, and generally persistent (e.g., a preprint on OSF has a DOI, a paper published in an economics working paper series is generally archived perpetually). Many citation styles have different rules for preprints/working papers than for unpublished hard-to-retrieve manuscripts.

In CSL, the type ‘article’ is used for preprints/working papers.

Don’t overthink the “Document” label in Zotero. I suggest it just because that is the item type mapped to CSL article. There will likely be a dedicated Preprint type in a future version of Zotero. You could alternately save these as Journal Article and put Type: article at the top of Extra.

adamsmith · September 13, 2020

You could alternately save these as Journal Article and put Type: article at the top of Extra

I'd actually advocate this over using Document directly: We should be getting a proper preprint item type in Zotero in the not-too-distant future, and this will make auto-moving items to that easier.

The current status is a bit of a mess:
- We used to use Report for working papers and then preprints because they were often published similarly (e.g. on SSRN), so there's a bunch of that
- arXiv users specifically requested importing as journal articles because that works best with standard bibtex files for citing arXiv
- journalArticle is the default fallback category for things we can't map successfully, because it's least likely to lose key data (like the DOI) but based on some recent changes in Zotero, that rationale is no longer relevant

In short, the current situation is in flux quite a bit and will probably not settle until preprints exist in Zotero.

hughp3 · September 13, 2020

Back to the original question of the thread: Is there a way to force the Zotero Connector to identify something as a manuscript, rather than an article which is the default.

As a follow on, if the algorithm approach in Zotero Connector is to support the positive identification of content via embedded description, is it reasonable then if an item's discovery webpage offers the meta tag <meta name="eprints.ispublished" content="unpub" /> for the web connector to select the Zotero Type 'manuscript'? FYI there are 4 values in eprints' taxonomy for this meta tag: pub, unpub, inpress, and submitted. inpress and submitted seem to clearly fall into the category of "pre-print". Otherwise what methods exist for forcing the connector to identify something as a manuscript?

given:
> - journalArticle is the default fallback category for things we can't map successfully, because it's least likely to lose key data (like the DOI) but based on some recent changes in Zotero, that rationale is no longer relevant

@bwiernik FYI I am not trying to organize my Zotero Library. I am trying to organize someone else's. As a website designer, I am trying to determine which set of tags will produce the desired result in Zotero.

adamsmith · September 13, 2020

No, EM will never import something as a manuscript.

hughp3 · September 14, 2020

Just so I am clear, EM is used to to extract embedded metadata from webpages, not embedded metadata from PDF resources... (from reading the file, this is certainly the impression I get as Highwire Press tags get used on HTML pages, not in embedded metadata in the PDF or png, jpg, etc. files themselves ...) For the reference and benefit of other readers, EM is here: https://github.com/zotero/translators/blob/cc7b0538d179c1c79e86392ca53f0769c91d8a28/Embedded Metadata.js

adamsmith · September 14, 2020

Correct. Zotero doesn't look at file metadata at all. Retrieve metadata for PDFs searches basic information about PDFs in their text (DOI, ISBN, title) and then queries the web for the metadata.