A visual web translator editor?

diegodlh · March 5, 2021

Dear Zoteroes. I would appreciate your thoughts and feedback about an idea to develop a visual editor for Zotero web translators.

Zotero web translators are not only used by Zotero, but also by services such as Wikimedia's Citoid [1], which let Wikipedia editors easily add citations by simply proving the URL of the source they want to cite.

In an ideal world, websites would expose metadata appropriately [2], and generic translators would suffice. However, this is often not the case and site-specific translators are still needed.

Zotero and Citoid are used worldwide. However, most translators are for English sources [3][4]. Although contributions to the Zotero's translators repository [5] are open, they require some technical skills. Zotero developers have shown willing to help with new translator requests, but the demand may be too high (currently there are ~40 open issues with the "New translator" tag [6]), and sometimes translators become broken, or are created by third parties with wrong cultural assumptions.

Therefore, based on a recent proposal [7] by another wikipedian, I am thinking of developing a web browser extension, a visual scrapper editor, that would enable non-technical users create and edit web translators, define test cases, and post them to Zotero's translator repo (and Wikimedia's fork [8]) for the benefit of both Zotero and Wikimedia communities.

Of course the translators created by one such tool will not be as good and powerful as those created manually, but they may help fill the gap until someone with Javascript knowledge can improve them.

I plan to write a proposal including the specific implementation I'm thinking of, but I would really appreciate your thoughts and comments, either here or in the Wikimedia thread [7] where this is being discussed already.

Thank you all very much!!

[1] https://www.mediawiki.org/wiki/Citoid
[2] https://www.zotero.org/support/dev/exposing_metadata
[3] https://www.mediawiki.org/wiki/Citoid/Creating_Zotero_translators
[4] https://meta.wikimedia.org/wiki/Research:Citoid_support_for_Wikimedia_references
[5] https://github.com/zotero/translators
[6] https://github.com/zotero/translators/issues?q=is:open+label:"New+Translator"+is:issue
[7] https://meta.wikimedia.org/wiki/Talk:WikiCite/Shared_Citations#Zotero_translators_integration
[8] https://gerrit.wikimedia.org/r/admin/repos/mediawiki/services/zotero/translators

adamsmith · March 5, 2021

Thanks for thinking about this. I'm very sympathetic to the problem, but I'm afraid I'm quite skeptical of this solution.

As you yourself note, translator review is a major bottleneck (although Zotero is, I believe, a full-time dev for this, so hopefully this will significantly improve), so adding the ability to easily create lower-quality translators by people who are also likely to require more hand-holding during submission has a good chance to provide more harm than good. You're going to create frustrated reviewers and frustrated first-time contributors.

Nor can we, given that the code is shipped to hundreds of thousands of Zotero users, skimp on the review process.

dstillman · March 5, 2021

Yeah, the two main bottlenecks here have been 1) lack of developer time to devote to this on our end and 2) lack of community contributions due to Scaffold (Tools → Developer → Translator Editor) being harder to use than it needs to be, which is largely a function of (1).

We have a full-time dev starting in a couple months who will be focusing almost entirely on translators, translation infrastructure, and developer tooling, so the situation should improve dramatically later this year. We don't have firm plans yet for improvements to Scaffold (or a successor to Scaffold, which could conceivably run in the browser), but we definitely want to make it much easier for people to get started building translators. All in all, our goal is to greatly improve the reliability of translation, and to do that in part by expanding the number of people submitting translators and the pace at which they're accepted.

diegodlh · March 5, 2021

Thank you both for your comments!

I see and understand Sebastian's concerns. Anyway, although having translators (developed with such visual tool) accepted by Zotero would be ideal, having them accepted by Wikimedia's fork (used by Citoid) would be OK as well. In fact, using them client-side only (as mentioned at the end of my last comment here https://meta.wikimedia.org/w/index.php?title=Talk:WikiCite/Shared_Citations#Zotero_translators_integration) might work too.

Therefore, besides the concern of whether to accept these translators in the official repo or not, do you see any technical (or non-technical) reason why this might be a bad idea?

Thank you!

adamsmith · March 5, 2021

I wouldn't say bad idea and it depends on the implementation, but I'd generally worry about encouraging quantity over quality in the medium run as you're going to see a growing number of broken and hard-to-maintain translators.

If you're thinking about the Citoid fork, you're also creating a lot more work there that you'll want to take into account and coordinate beforehand (I don't think there are currently any Citoid-only translators). You're also putting maintainers there in the situation where they need to reconcile/moderate between Zotero and Citoid-only translators.

I'd also at least consider oppertunity costs: what could be done with the same amount of development and moderation effort _instead_. E.g. I think you'd do a lot to improve site support and quality by working on Zotero/Citoid's ability to understand JSON-LD and other newer metadata features. Another project might be to work on converters for existing scrapers such as those for Romanian newspapers mentioned over on Wikimedia.

dstillman · March 5, 2021

First, just to echo adamsmith, I really appreciate your thinking about this.

To be clear, I think there's a huge amount that can be done to lower the bar to translator development (and that's part of why we hired a developer to focus on it).

But the problem with a visual tool aimed at non-technical users, while an admirable goal, is that it's likely to produce fragile, poorly performing translators, resulting in errors or bad/missing data and requiring frequent revisions. It would likely be limited to scraping, which is the least-desirable way of writing a translator, and often wouldn't even benefit from basic transformations to clean up the available data that could be done in code by someone with basic JavaScript knowledge.

I think the effort here is better spent on making it easier for people with some basic web skills to write translators and get them reviewed and merged, and that will be done by reimagining and rebuilding Scaffold. I think there very well may be a visual component to that for when scraping is appropriate, and I'd be happy to talk through your ideas, but even something like a CSS query selector that a visual tool might produce really needs to be fine-tuned by someone semi-technical to not be incredibly likely to break or produce bad data. Even people who are comfortable writing a basic translator are unfortunately sometimes not great at thinking about how to write a good selector that will work across multiple pages of a site and survive the most trivial site changes.

I acknowledge that there's perhaps a perfect-is-the-enemy-of-the-good argument to be made here, but metadata quality and saving reliability does matter when generating citations, so bad translators can sometimes be worse than non-existent ones, in particular since lower-priority translators can sometimes do a decent job.

So I'd still say that the problem here really is just developer time. We have a huge backlog of open translator pull requests, and we need to clear the queue, which I suspect would alone go a long way towards encouraging more submissions. That will be the first order of business for our new developer starting in May.

(Honestly, the top priority here is probably to improve JSON-LD support — we have some complicated open issues for that — so that we get better-quality data for free across more sites.)

dstillman · March 5, 2021

(Also, while improving Scaffold is a more major undertaking, and one that we're planning to work on later this year, in the shorter term we might be able to fund some work on improving JSON-LD and the like. If that's of any interest (to you or anyone else from the Wikimedia community), send us an email at support@zotero.org.)

diegodlh · March 8, 2021

Thank you both again for your follow-ups. I took the weekend off and just saw your replies in detail.
I think the concerns exposed may be summarized into two main areas: (1) translators created with a visual creator may be of very bad quality, and (2) it may be better to focus on improving embedded metadata (e.g., JSON-LD) support instead.

Regarding (2) JSON-LD support
> a visual tool aimed at non-technical users (...) would likely be limited to scraping, which is the least-desirable way of writing a translator
I understand scraping is the only alternative we have when metadata is not appropriately embedded in the website, correct? In fact, Zotero's "Writing translator code" guidelines [1] point to Wikimedia's "Creating Zotero translators" [2] which says the scrape function is "the most interesting function to code in a translator".
I understand, though, that you may have meant that it's preferable to have Zotero translators handle as much of these embedded metadata as possible (hence, your suggestion to focus on improving JSON-LD support instead).
Although I definitely agree that improving JSON-LD support is important to increase translator coverage, this plugin idea is to help fill the gap left by websites not embedding metadata or embedding wrong metadata, for which better JSON-LD support would be of no help anyway.

Regarding (1) translator quality
> fragile, poorly performing translators, resulting in errors or bad/missing data and requiring frequent revisions (...) and often wouldn't even benefit from basic transformations to clean up the available data
> even something like a CSS query selector that a visual tool might produce really needs to be fine-tuned by someone semi-technical
Regarding translator performance, there are visual scraper tools out there that I was planning to base this tool on. Some commercial, such as AnyPicker [3], other open source, such as Portia [4]. However, TBH I haven't used them so I don't know how reliable they are.
Regarding basic transformations to clean up the data, I was thinking of a two-step process: first selection (to pull relevant data from the HTML), then post-processing (with basic functions such as splitting, merging, trimming, etc).
Finally, going back to scraping being the least-desirable way of getting data, the idea is that the plugin offers to use embedded metadata as default, and only for fields where metadata is unavailable (or wrong) it offers to scrap (or hardcode if applicable) values instead.

> metadata quality and saving reliability does matter when generating citations, so bad translators can sometimes be worse than non-existent ones
In spite of what I argued above, I do agree with Dan here that having a translator that saves the wrong metadata may be worse that not having a translator at all (at least the problem is obvious in the second case, whereas it may go unnoticed in the first one).
As discussed before, I wonder whether having a separate repository of community translators, used by Citoid or by the proposed plugin alone, would be of any help. Citoid's developer Marielle Volz hasn't commented about this yet.

Once again, thank you both for engaging in this conversation. I really appreciate your feedback. I have briefly summarized at the end of the corresponding Wikimedia Meta thread [5].

[1] https://www.zotero.org/support/dev/translators/coding#saving_single_items
[2] https://www.mediawiki.org/wiki/Citoid/Creating_Zotero_translators
[3] https://anypicker.ryang-studio.com/
[4] https://github.com/scrapinghub/portia
[5] https://meta.wikimedia.org/wiki/Talk:WikiCite/Shared_Citations#Zotero_translators_integration

adamsmith · March 8, 2021

Re: scrape -- the "scraping" function in Zotero translators isn't synonymous to screen scraping.
E.g., the templates provided by Scaffold for scrape all use some sort of structured metadata, often supplemented by scraping, so that'd be closer to what you describe here:

the idea is that the plugin offers to use embedded metadata as default, and only for fields where metadata is unavailable (or wrong) it offers to scrape (or hardcode if applicable) values instead.

Except that translators don't just rely on embedded metadata but can draw on other formats (e.g. RIS or BibTeX for scholarly sources) where available.

For JSON-LD -- I think the point here is that Zotero currently doesn't use that at all, so if you're able to add and improve that, you'll a) improve support for a significant and growing number of pages without a translator and b) improve performance and ease of coding for any translator built on top of EM (via a tool or not), so that'd really seem to be much higher impact work.

diegodlh · March 16, 2021

Dear @danstillman and @adamsmith, I continued thinking about this idea and finally presented a proposal to the Wikimedia community, taking into account the feedback I received here and in other fora.

https://meta.wikimedia.org/wiki/Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators

In short, I dropped the idea of posting Zotero translators created with a visual editor to the translators repository. Instead, the proposal comprises a separate translation server, that uses community translators to translate a given URL to (1) provide additional results to the Citoid service, and (2) to generate a proxied version of the website with embedded citation metadata (that can be read by Zotero's embedded metadata translator).

Feel free to provide your comments and thoughts in the discussion page!

diegodlh · July 22, 2021

Hi, all! The proposal has been approved by the Wikimedia Foundation!! You can follow the news here: https://meta.wikimedia.org/wiki/Web2Cit

We have also opened a call for members for an Advisory Board to help us think and tackle critical aspects of the project. We are looking for technical, community-oriented, or software-sustainability profiles. We would be thrilled to have members of the Zotero community onboard! Applications will be open until August 6th, 2021: https://meta.wikimedia.org/wiki/Web2Cit/Advisory_Board/Call_for_members

dstillman · July 23, 2021

So again, while we appreciate your efforts here, we strongly disagree with this approach and won't be supporting this project. We think it will produce low-quality translators with incomplete metadata that break frequently and frustrate users, and it will divide the potential development community for no reason.

We've had a developer working full time on translators for the last couple months, working through the backlog of PRs and working on support for many more sites. There's already been a huge improvement in metadata quality across many sites. This is the first time in probably a decade we've had someone on staff dedicated to this work, and it's something we're committed to continuing, so it's a truly odd time to fork the project.

We explained the problem with attempts at automatic selector generation above, so I won't rehash that, other than to reiterate that we think it's a fundamentally flawed concept. We're working on a redesign of Scaffold to make it easier for new people to get involved with translator development, and it will include some devtools-like features, but there's no getting around the fact that writing reliable translators requires a modicum of technical understanding.

The proxy idea is bizarre and counterproductive. If someone creates a translator and wants to share it with others outside of the Wikipedia community, they should contribute it to us. If it isn't all :nth-child(56), we'll fix whatever has to be fixed and merge it in so that everybody using Zotero translators can benefit. Translating proxied URLs makes no semantic sense and would violate the privacy protections we offer our users, so please remove the claim that that's something you're doing for the benefit of the Zotero community.

We really appreciate Wiki-universe people's contributions to Zotero translators, and we'd love for them to stay involved, working alongside our new developer. We'd love for people to contribute new translators, help improve documentation, help improve Scaffold, etc. But this is just going to produce worse translators for Wiki users with no benefit to Zotero users, and we think that's a shame for both projects.

AbeJellinek · July 23, 2021

As the aforementioned new developer working on translators, I agree with everything Dan said. I want to emphasize that I don't think this project is going to increase the pace of Citoid translator updates in the slightest, contrary to its stated goals. My main concern stems from something you mentioned in the grant proposal:

in principle, only the original author would be able to edit a translator, whereas other users would be able to fork it

Anyone can contribute to any translator in the zotero/translators repo. Myself and a handful of other developers with commit privileges are just there to make sure that contributions meet our general quality standards, selectors are stable, personal information isn't leaking into item metadata, that kind of thing. By contrast, your proposal seems to indicate that only the original developer of a translator can change it. How will forks work? Will users have to slog through 100 possible citations generated by 100 forks of the same translator and then select the one they think is best? That sounds markedly worse than just sticking with the Git repo we already have.

Additionally, translators with automatically-generated selectors will break more egregiously, more often. In, say, 60% of cases, generating a selector is easy. Maybe an element always has ID title or it's always the first <h1> on the page. But the remaining 40% is terrible. It's common to have to match on the text content of a nearby element or test the format of a value using a regular expression. I'd love to see an automated selector generator that can do any of that without spitting out something incredibly fragile (:nth-child(56)) that only ever works on the page it was generated on. Non-programmer users won't know a good selector from a bad one.

Additionally, I wanted to respond to something you specifically called out on the project page:

For example, a translator for a mainstream Argentinean newspaper has recently been created by one of Zotero developers, following a request from a non-technical user in their forums. In spite of the developer's good will, it was created on the wrong cultural assumption that most last names in Argentina have two parts (in addition, the translator seems to be no longer working already).

I'm not sure when you wrote this, but I fixed the La Nación translator over a month ago, as soon as I noticed that it was broken while doing a regular survey of web translators. (The translator test dashboard that you link to was taken down a couple weeks ago.) I haven't addressed the two-surname issue because I didn't know about it. Just submit a PR or open an issue! I would merge a PR just changing those few lines and updating lastUpdated without question. If you opened an issue, I would fix it myself the same day.

Our issue is not that I don't have time to fix known problem translators. It's that people don't tell me when translators break! Merging PRs and fixing broken translators is literally my job. I'm happy to fix any issue you report to me. I'm a little less happy if the first I hear about it is in a Wikimedia Foundation grant proposal.

I strongly urge you to reconsider this project and to continue to allow Wikipedians' contributions to benefit every Zotero/Citoid translator user, not just the ones who follow you to this fork. I know that your intentions are good and your frustration with out-of-date and missing translators is extremely valid, but this approach will fall short of solving one problem while creating ten more.

adamsmith · July 23, 2021

(FWIW I think the surname handling in the La Nacion translator is fine. When a name has >2 parts, assuming that the last two are family names isn't perfect, but imo the best option for Argentine names - the tests have currently one example where this is right, one where it isn't. The code comment was just badly worded on my part - most names in Argentina and in La Nacion have a single given and family name and the heuristic never applies at all - sorry about that)

diegodlh · July 27, 2021

Dear Dan and Abe,
I regret to know that you won’t take part of our project’s Advisory Board. We understand that the project’s goals and approach may not align with your priorities. Notwithstanding, I would like to clarify some aspects of the project, to make sure they are understood correctly.

>Web2Cit is not a fork of the Zotero translators project
We are sorry if we gave the impression that we wanted to fork the Zotero translators project. This is definitely not one of the project’s goals.
Web2Cit is intended to be a complementary layer to help temporarily bridge the gaps left by the Zotero translators project.
It is not our plan to divide development efforts. As we have already stated, if someone has sufficient technical knowledge, we definitely urge them to contribute to the Zotero translators repository instead, as I myself do and will continue doing.
Our project’s aim is to enable communities from diverse cultural and technical backgrounds create basic translators that help them move on with their (translator-unrelated) projects, without having to rely on an English-speaking developer, or having to wait for their contribution to be merged into Zotero and Citoid codebases.
In addition, as we have explained in our proposal, we hope that these (basic) community translators may in the future guide the development of more robust JavaScript translators to be submitted to (or pulled from) the Zotero translators repository.
So, no. We don’t think Web2Cit is a fork of the Zotero translators project.

>Web2Cit proxy’s main goal is Wikipedia-oriented
The main purpose of the Web2Cit proxy is to serve the Wikipedia community itself. This is because, as explained in our proposal, there is no guarantee that the Citoid service will support Web2Cit from the beginning, or that user scripts/gadgets can be written. I don’t think we have claimed “that that's something [we]'re doing for the benefit of the Zotero community”. The fact that other communities may use and benefit from the proxy is only a side effect.

>Technical challenges
We think the project poses some technical questions which are challenging, but not insurmountable. This is what makes the project interesting instead of trivial.
Questions such as what automated selector + post-processing strategy would work best, or whether crowd annotation + machine learning could be used instead; or how to enable collaboration around community translators (a discussion already started here: https://meta.wikimedia.org/wiki/Grants_talk:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators#Translator_storage_and_collaboration), are the kind of questions we expect the Advisory Board will help us think through.

>A Wikimedia approach
We think there may be multiple, complementary ways of approaching the problem of website metadata translation. We acknowledge the value of having a few (English-speaking) experts contributing high-quality code to the Zotero translators repository, and we celebrate the fact that Zotero has someone on staff dedicated to this work now.
But we also think that a complementary crowdsourced lower-tech approach may be faster to cover some gaps, although rudimentarily, until they can be fixed officially, in a way similar to how Wikipedia and other Wikimedia projects have grown and continue to grow to date. This may be especially relevant in areas less interesting or visible to the average English-speaking tech-savvy Github contributor. We expect this crowdsourced approach will help compensate for the lower-quality translator fragility you are legitimately concerned about, the same way the Wikipedia community has shown so robust in managing vandalism, for example.

All in all, we hope Web2Cit will work as a complement, definitely not as a replacement, of Zotero translators. We hope together we can build a better translator ecosystem that will help both Wikimedia and Zotero communities.

diegodlh · July 27, 2021

> When a name has >2 parts, assuming that the last two are family names isn't perfect, but imo the best option for Argentine names

@adamsmith, what do you base your opinion on? I’m Argentine myself and I have the intuition that it is more likely to have two names than two last names in my country. As I may be biased (I have two names), I asked some Argentine acquaintances what they thought would be more likely in Argentina: (a) that a person had two names, or (b) that a person had two last names. 10 out of 11 who replied said “a”.

Anyway, as this shouldn’t be a matter of opinions or intuitions, I checked some sources and found that in 2015 75% of newborns in Argentina were given two or more names [1]. Conversely, in 2016-2017 only 33% of newborns in Buenos Aires were registered with both father’s and mother’s last names [2] (although it is true that people may have father/mother-only last names of 2 words or more).

I would never suggest Zotero developers should be aware of these cultural details around the globe. I just used the example to illustrate how complex cultural diversity can be, and how important it is to have as many voices as possible from around the world, regardless of their English and technical abilities, both needed to participate in Zotero translators Github repository.

[1] https://datos.gob.ar/dataset/otros-nombres-personas-fisicas/archivo/otros_2.1
[2] https://www.nueva-ciudad.com.ar/notas/201812/39552-en-2018-cuatro-de-cada-diez-bebes-fueron-inscriptos-con-el-apellido-de-la-madre.html

adamsmith · July 27, 2021

The question is not _have_ but _use in print_ and La Nacion journalists aren't a random sample of Argentines exactly.
So I mainly based this on looking through a fair number of articles in the paper and seeing what works better (FWIW I do speak Spanish and have lived in BsAs - and read La Nacion - for a good bit, but obviously I could still be wrong here)

diegodlh · July 27, 2021

> The question is not _have_ but _use in print_ and La Nacion journalists aren't a random sample of Argentines exactly.

You are right.

> I mainly based this on looking through a fair number of articles in the paper and seeing what works better

Thanks for clarifying this. I should have chosen a better example, then. Sorry.

> I do speak Spanish and have lived in BsAs - and read La Nacion - for a good bit

Great! I guess you would agree then that understanding these cultural and local nuances are sometimes important to write better translators.

dstillman · July 27, 2021

We are sorry if we gave the impression that we wanted to fork the Zotero translators project.

We're clear on what you're doing. Our point is that you're splitting development resources that could be pooled to solve the same problems for more people. We would've welcomed Wikipedia's involvement in making translator development accessible to more people while maintaining the high quality and reliability of translators.

I don’t think we have claimed “that that's something [we]'re doing for the benefit of the Zotero community”.

The proposal says this: "This way, the proxied web site will be available for translation with official generic translators by any service relying on them; including the Citoid service (until they add Web2Cit as an additional source), Zotero's browser connectors, Zotero's ZBib, etc." You are absolutely giving the impression that this will provide value to Zotero users. It won't.

You also claimed that "Feedback from the Zotero developers helped shape this proposal as well", which while perhaps technically true is I think a bit misleading. Our feedback was that we think this is a bad idea and you shouldn't do it, and that we'd love for you to consider ways you could harness the Wikipedia community's energy to actually contribute to the main project for everyone's benefit.

We hope together we can build a better translator ecosystem that will help both Wikimedia and Zotero communities.

But we're not doing anything together — we've said from the beginning that we disagree with this approach and we see it as purely a negative for the Zotero community, and you decided to go ahead with it anyway. That's certainly your right, but please don't give the impression that this is any sort of collaborative or mutually beneficial effort.

Wikipedia's needs may very well be different from Zotero's — the collaborative editing process may create a higher tolerance for errors, breakage, and poorer-quality metadata — but I can't help but think there would've been more productive avenues to explore before taking this rather drastic step. In any case, I hope it works out well for the Wikipedia community.

adamsmith · July 27, 2021

I think I feel less strongly about this than dstillman, but this is part of where I disagree with your thinking on this

Great! I guess you would agree then that understanding these cultural and local nuances are sometimes important to write better translators.

While it's absolutely true that language and cultural awareness can help in getting good metadata, in practice it's a really minor factor compared to a) understanding metadata and b) understanding what an automated tool can & cannot do, e.g. via regex.

Just to give you a recent example, look at the issue reported here: https://forums.zotero.org/discussion/90926/error-in-the-reference-of-my-last-name
In trying to troubleshoot the issue, Abe and I looked at three different metadata formats, needed to know which of the three the translator looks at; one of them was accessed via API/a third party site. That's fairly common in trying to get the best metadata for a site (in this case it's a scholarly site, but between metatags, JSON-LD, micro-RDF etc. this applies equally to the regular web). And in that case we can't even fix the problem because the quality of metadata supplied (by an academic publisher!) is too low. I just don't see how you can empower non-technical users with limited understanding of metadata to do a particularly good job with that.

DWL-SDCA · July 27, 2021

@diegodlh

A few years ago my SafetyLit project tested using crowd sourced metadata gathering. We did this because we found differences with several publishers print (or PDF) metadata and the metadata provided on their websites or that provided to indexers via the publisher FTP. I find frightening the idea of crowdsourcing the development of publisher website translators or scrapers by people who aren't experienced with both coding AND metadata standards. Voting or majority preference will not provide complete and accurate metadata.

It was a ridiculous failure involving dueling edits of the test database by several people each of whom were wrong (according to metadata conventions and cataloging standards). Every participant had an advanced degree. Anger became common between participants themselves and between participants and accepted metadata standards. "The standard is _wrong_ my idea is clearly better." "My method of imputing (edit: not a typo for inputting) the missing metadata works well and is better than ..." Essentially everyone in the group was dissatisfied with something on every publisher's website and was determined that the publishers' metadata presentation be altered to improve something. "Why doesn't every publisher do it this way," was common.

A few examples of obviously nonsensical but firmly held notions:

Some journals (The Lancet) have cumulative issue numbering across volumes as opposed to issue numbering within volumes. This led to angry disagreements about which single numbering system to use or whether to use both systems.

Some journals do not use a volume system but simply number the issues across the publication years. There were arguments that these issue numbers were 'in the real world' volumes. Thus, the issue numbers should be moved to the volume field and the issue field should be empty. Others argued that the issue number should be moved to the volume field but the issue field should be assigned "1" because it is the only issue within what is really a volume.

See the link above in @adamsmith 's comment concerning author names. Should hyphens always be added to authors having more than one last name because without hyphenation the names are confusing? Should name particles _always_ begin with an upper case character (regardless of the language's name convention)? Should there _always_ be a space (inserted if necessary) between the name particle and the root name? (Should my wife's family name be altered from the true dePender to De Pender?) Should a family name like "St. John" always be made to read "Saint John" so that it matches 'proper' alphabetization? Should names such as "Saint John" or "Santa Maria" or "Santamaria" , for the sake of unification, be altered to St. John, St. Maria, and St. Maria? Should the characters in author names containing characters such as Æ, ß, Ö _always_ be converted to ae, ss, oe for uniformity and to help English language searchers? Should non-Roman characters in the names of non-English authors be transliterated/translated? Should the names _never_ be transliterated even to the point of always excluding English versions supplied by the publisher. (The group included people who used a non-Roman alphabet in their primary language -- they didn't argue against transliteration.)

If a special issue or supplement is issued along with a regular numbered issue should the supplement be given the number sequence of how many supplements were issued in that volume or should the supplement issued along with an issue numbered, say, 7 also be given the number Suppl 7? Shouldn't a supplement always be listed in the volume field because it is really more like a different volume that an extra issue?

Should journal titles be in title case or sentence case? What about journal abbreviation casing? Should journal abbreviations include periods or be without periods?

Should Google Scholar metadata be used even if there is disagreement with the journal publisher "because most people will be drawn to Google and might never see the publisher's site"?!!!

Should both print and electronic ISSNs be included in a record if the journal is no longer published in both formats?

As I've written this I keep thinking of more examples.

Even people who are very knowledgeable in their specialty are not likely also to have knowledge about cataloging and metadata standards and practices. They, however, will likely have strong opinions about what they want to see. Who will serve as arbiter? Can we trust the group process to obtain true and complete metadata from publisher websites? What happens when there are disagreements about what is accuracy and someone insists upon modifying the publisher's information to suit their individual desire?

Please reconsider doing this "your way" and instead cooperate with Zotero and CSL experts who have been doing this for years, who have learned from mistakes, and who have long demonstrated good faith and a cooperative spirit. I understand that you have faith that the Wikipedia crowd approach can work and can make things move forward quickly. My own experience shows that such a plan does not hasten the collection of accurate and complete metadata. I caution you to not attempt to develop a new system for writing translators. It will be a waste of time and resources.

DWL-SDCA · July 28, 2021

Continuing with preposterous arguments...

There were demands that all demonym adjectives begin with a lower case character because it is an adjective not a noun. The safety of italian roadways not Italian roadways. Improvements in the spanish trauma care system but Spaniards. Italians not italians. Nonetheless- Americans and American, British and British without good reason. These folks actually edited article titles and abstracts insisting that it was necessary and constructive.

A civil engineer from Cambridge demanded that Initialisms and Acronyms should _not_ be all caps because that is the convention only in the USA -- not WHO but Who (World Health Organization) especially because the organization is also known as "organisation mondiale de la santé" or "世界卫生组织". I don't follow his logic. Now, in his eyes, I'm a fool and he is obviously correct.

edit : Let me reinforce that people in the test group were silently altering what was on the publisher's website or printed product and insisting that the publishers were wrong. Thus, the "corrected" metadata was more appropriate. I had to make considerable effort to find the improper corrections and change the metadata back to its original form.

These people were hand editing the imported metadata but these kinds of problems could also occur with manipulations in the translators that automatically import metadata from the publisher websites and assign the bits to the proper field.

Do you plan to lock the crowd-sourced translators much as you do with the editing of Wikipedia articles concerning living or controversial individuals? When the publisher changes its website how will this be recognized as a need for a translator revision and how will this be differentiated from inappropriate or unnecessary translator modifications?

emilianoeheyns · July 29, 2021

The WHO vs the Who is hilarious. How is that not unambiguously the worldwide health organization vs the band?