Integrating large controlled vocabulary

I'm the tech lead on the World Flora Online Plant List. We maintain an expert curated list of vascular plants and bryophytes as open data with releases twice a year.

https://list.worldfloraonline.org/

A few of our 200+ contributors use Zotero to manage references and we are having a discussion about how we could better integrate our checklist with their group libraries and if Zotero could be used more widely.

Our list is effectively a large controlled vocabulary with 1.6 million terms in it. ~600k of these are accepted names in a hierarchy and 1m are synonyms or unplaced. All names/terms have stable IDs but their precise spellings and renderings may change or be ambiguous - hence we have IDs! The hierarchy of names/terms subtly changes with every 6 monthly release but the IDs are constant.

I'm looking for ideas about how we could integrate with Zotero so that researchers can curate their references against an agreed list and so that we can produce bibliographies for different groups etc etc.

1) We could use our WFO IDs as tags - but this would be really user unfriendly. Perhaps we could write a plugin that rendered the name but stored the ID? With a lookup interface etc we might end up with a almost a new tagging mechanism which would be hard to maintain.

2) We could create a hierarchy of 1.6m collections in a shared library! We could use the API to maintain this hierarchy as it changes with each of our data release.

Have similar challenges be approached before?

We'd welcome any suggestions with much gratitude!!
Sign In or Register to comment.