Publishing a large bibliography on the web, with a faceted search interface?

dlesieur · May 15, 2018

I'm posting this to gauge the interest of the Zotero community in publishing bibliographies on the web, and making them searchable through a user-friendly, yet powerful faceted search interface. Here's the URL of such a website, that I have recently developed for a community of researchers that wanted to publish a bibliography of 11k+ items: http://quescren.concordia.ca/en/search. The data is stored in a group library on Zotero.org, and the software uses the Zotero API to sync the data into its own search index.

The first four "facets" in the search interface are actually separate collections in the library, and the hierarchy under each facet corresponds to the hierarchy of subcollections. This structure takes advantage of Zotero's capability to put the same item in multiple collections. The remaining facets are based on specific item fields. Bibliographic metadata is embedded in search results pages and individual item pages, letting visitors easily copy items into their own bibliography with a tool such as Zotero Connector.

The software is built in Python with the Flask framework. With some more work, it could be made generic and customizable enough to be useful to other people or organizations, open sourced, and supplemented with a proper test suite, proper documentation, etc.

The questions are: would you or your organization...

- want something similar?
- fund the development of something similar?
- rather host the software on your/its own servers, or subscribe to the software as a service?

Let me know what you think! If you'd rather reply in private, please write to david@whiskyechobravo.com.

Best,
David.

Gurdas_Sandhu · May 16, 2018

David, this is good work. I would be interested in more information regarding how this could be useful for a group bibliography to be published so the research community can benefit. Thus, there is no organizational interest or funds.

There are other examples of similar effort, see http://www.firstworldwarstudies.org/bibliography.php
This example might be built using https://wordpress.org/plugins/zotpress/ (but I'm not sure of that).

See this discussion, https://forums.zotero.org/discussion/36304/examples-of-zotero-group-libraries-published-on-websites

My bibliography organization is focused on tags and not collections. I would like to hear how your product is better than the current online zotero library.

dlesieur · May 17, 2018

@gurdas, thanks for the feedback. Yes, http://www.firstworldwarstudies.org/bibliography.php seems to have similar goals to http://quescren.concordia.ca. The main target audience for the tool I'm suggesting would probably be research groups like these.

I can't yet speak of a product, as some work would be required to make the software generic enough for use in other projects, but say we have an open source product, pros and cons over sharing a bibliography directly through zotero.org might be as follows.

Pros:

- A user-friendly faceted search interface that accommodates both expert search and exploratory search needs. Keywords and filters can be combined in any order to refine or expand search results, helping the discovery process. Filters are only offered when they would return actual results, and the simple fact that the number of items associated with each filter is displayed may help the user quickly grasp what's in those search results.
- Possible to choose the bibliographic style for search results. Search results look more like a bibliography and less like a database table.
- Seamless integration into a larger website, or custom "branding" for the bibliography, though customization of the web design.
- More detailed item view, showing the collections an item belongs to.
- Some extra features such as printing, direct link to search on WorldCat, etc.
- Possible to develop new features (thanks to the open source license).

Cons:

- Installation and customization require web development skills.
- Self-hosting requires some software maintenance and has a cost.
- Might be affected by future changes to the Zotero API.
- Some features are missing, such as exporting items to files, but that could certainly change in the future. A lot more could be done; see last item under "Pros". ;-)

I'm probably missing other pros and cons!

BenLabay · August 20, 2018

David, are you still working on this? I'd be very much interested in extending your work for use on a wordpress site. I'm working with a non-profit group to create a digital repository of science records for watersheds in Texas and the SW US.

I'll send you an email as well, but a quick update here on the status of your work would benefit others I'm sure!

dlesieur · August 20, 2018

@BenLabay, glad to hear that you have a similar project! If all goes well, I should be working on this somewhere between December 2018 and March 2019.

Regarding integration with Wordpress: The software is built in Python, so it would run as a separate app from your Wordpress site, but it could be made to look identical in order to provide a seamless user experience. The template language is Jinja2; if you happen to be using Twig with Wordpress, the syntax is pretty much the same, which could make porting the templates from Wordpress easier.

I'll try to post an update here whenever there is significant progress with the project.

FHeimburger · August 20, 2018

Coordinator of the First World War Studies bibliography mentioned earlier here - it used to be hosted using zotpress under Wordpress. At the moment it is fairly broken, using php scripts which fall foul of the 100-item-max for single calls to the API.
I would be REALLY interested to hear more about your solution as fixing our bibliography has been on mz list forever.

dlesieur · August 21, 2018

@FHeimburger, too bad your code stumbles on the item limit. This kind of limit is common with any API, so I hope you'll be able to get your script fixed. Though I'll be glad if my solution can be of use to you! I'll post updates here.

shuhuang · February 13, 2019

Hi @dlesieur!

I'm glad I stumbled upon this! I'm in the midst of setting up an online, searchable database on a particular topic. It only has to be able to do the following:

1. Provide basic citation information (if the user wishes to cite the article)

2. Show an abstract (to give a user an idea of what the article is about)

3. Allow sorting by keyword tags (if the user wishes to see all material on a particular keyword)

4. Be searchable like a library catalogue.

5. If a soft copy is available, a URL where it can be downloaded (this will be provided)

6. If not, links to availability in 4 local libraries (links to each of their catalogue entries will be provided).

I've already got all this information in Zotero. My challenge is to make that information accessible in a user-friendly manner.

I'll explore the projects you've posted earlier. But in the mean time, let me know if there are updates to your initiative!

dlesieur · February 14, 2019

Hi @shuhuang, you seem to have the perfect use case for the proposed system. The project has been delayed a little, but work is now slated to start in April. However, half of the expected funding could not be obtained, so my plan is to go as far as possible with the given funds. Perhaps some features will be missing, perhaps the software won't be as customizable as I have hoped, and I might have to cut down on more community-oriented tasks such as documentation and automated tests. But I'll be going forward in April!

DWL-SDCA · February 15, 2019

I'm not sure of the cost to remove the aspects specific to my own database and website but I'm willing to authorize my developers to do so. SafetyLit.org has everything in your list plus a search-system thesaurus (synonym-ring capabilities and optional return-everything-below a term in the hierarchy, etc.). The admin side is a user friendly interface to our webservers (SQL database, php, ajax, and a few javascript tools to create the public pages on the fly). The system has tools to parse MODS Zotero export into the SafetyLit import, prompts to remind administrators to re-examine publisher websites for newly published journal issues, a system to create a LibreOffice document that is a listing of selected items added to the database by assigned categories, RSS feeds of the new entries by category, and more. There are also MODS parsers for books, book sections, conference proceedings, journal articles, technical reports, theses, and a PubMed xml parser.

If more than one entity were to desire this, the cost of getting the empty, generic system could be quite reasonable. This was a custom system built over 15 years specifically for my project. It has been well tested and overall (through the years) cost about US$80 thousand. The cost of stripping the SafetyLit content is likely to be considerably less than one-tenth of that amount. Once a generic system becomes available it would be available for all.

Zotero is fundamental to the operation of Zotero SafetyLit. We examine publisher websites for journal articles, etc. import them into Zotero , export them to MODS, and import them to the SafetyLit database. In the early days, publishers would provide us with metadata from their ftp sites but that was cumbersome because we index very few journals cover-to-cover. We had to download the metadata for everything, convert it to human readable format and discard the many unsuitable items. Now we visit publisher journal sites, download selected items to Zotero and only import the things to the database server that should remain.

Parsers for things like podcasts, artworks, etc. do not yet exist. My contact info is on my SafetyLit website.

EDIT: If I didn't make it clear the cost of this would be limited to the time required for the developers to un-customize the multiple scripts. I'm not expecting this to bring me any extra money.

dlesieur · February 15, 2019

@DWL-SDCA, thanks for opening another possibility! The description of your system makes me think of Wikindx (http://wikindx.sourceforge.net/), which could be yet another option for similar projects. However, from what I have seen Wikindx's search interface is not the most user-friendly.

The architecture of the system whose development I'm about to start (not from scratch, but based on prior work mentioned in my initial post) is quite different in that it has no admin backend other than Zotero. The system provides an almost "live" view of a Zotero library (an automated process regularly synchronizes the data into the search index), and focuses on providing a user-friendly faceted search interface that leverages Zotero's fields, tags, and hierarchical collections.

griis · March 14, 2019

@dlesieur I love how your library displays on the web, amazing UX, search capabilities ++.
I'm not a developer, but would be a superuser of such a solution. I'm building a website for a research group, and we need a user-friendly way to put our publication online in a nice way.
I can't get Bibbase or Zotpress to do the job properly.
From what I see on your website http://quescren.concordia.ca/en/search, it seems you've worked out a solution that's both stylish and functional.
We're based in Sherbrooke, and I'd love to connect!

emilianoeheyns · March 14, 2019

I'm mostly done with a built-to-order script that does a one-way sync into a postgresql database. It's targeted towards reporting, so the database schema is strongly denormalized to save on joins, but the sync code should work. Happy to collaborate. I think the party that ordered it is open to have this made available publicly. It uses paging to stay under the item limit.

ChristineJames · May 14, 2019

Brendan / dlesieur the example on your website looks marvellous.
I have a fairly small 2,000 entries zotero group, the only crucial thing is the Tags.

Let your admirers know how you're doing.

dlesieur · July 12, 2019

@griis @emilianoeheyns @ChristineJames Sincere apologies. I'm not sure how it happened, but I had missed your posts! I will soon publish an alpha version of the tool and will be happy to continue the discussion.

dlesieur · July 18, 2019

The software is now on GitHub (https://github.com/whiskyechobravo/kerko), and there is a demo site (https://demo.kerko.whiskyechobravo.com/bibliography/). Any feedback is welcome!

rkaplan@umrpc.com · July 18, 2019

Very very impressive Demo - that is a wonderful contribution to the community... I will try to set this up and then give you feedback.. many thanks

rkaplan@umrpc.com · July 18, 2019

Question - If I wanted to use this to share a portion of my Zotero library but not all of it with a specific person, could I set it up to only display citations in a given Zotro collection and any subcollections? And similarly share a different collection with a different person?

rkaplan@umrpc.com · July 18, 2019

Another question.. for "Topic" and "Field of Study" and "Type of Contribution" - how do you specify these for each Zotero citation? What field(s) do you use for this information?

dlesieur · July 18, 2019

@rkaplan@umrpc.com There is currently no option for displaying only part of a library. My suggestion would be to copy the items you wish to share to a separate Zotero group library, and then configure Kerko for that library.

Regarding the "Topic", "Field of Study" and "Type of Contribution" facets, these derive from collections in the Zotero library (https://www.zotero.org/groups/2348869/kerko_demo/items). The example configuration at the end of KerkoApp's documentation shows how to configure such facets (https://github.com/whiskyechobravo/kerkoapp#example-configuration).

rkaplan@umrpc.com · July 18, 2019

Ahh... got it - you are defining those facets based on placing the items into collections - very nice.

KERKOAPP_COLLECTION_FACETS="KY3BNA6T:110:Topic; 7H2Q7L6I:120:Field of study; JFQRH4X2:130:Type of contribution"

bjohas · July 19, 2019

Hello all - very interesting! We'd be interested in this too!

fbennett · July 19, 2019

@

fbennett · July 19, 2019

(Oops, sorry about that. Mobile finger-fault)

@bjohas: As an aside, for another European project I built a simple bridge tool in node that does the necessaries to keep a front end site in sync with a Zotero library used as back end. Very basic at the moment, but it groks Jurism content and CSL-M legal styles, and could be made more sophisticated in other ways. It's in npm as citeproc-cite-service.

dlesieur · July 19, 2019

@bjohas Great! I'll let you have a look at Kerko (https://github.com/whiskyechobravo/kerko). You may post requests/questions to its issue tracker, or reach me at david@whiskyechobravo.com if needs be.

bjohas · August 10, 2019

@fbennett , @dlesieur - thank you so much. Have been a bit side-tracked by other things, but will get to this soon! Thanks you!

mdover · April 23, 2020

I love this example:https://www.healthpathwayscommunity.org/Research-Hub/Publication-Database
Hoping to network with someone from a Kerko/Zotero database done on a university server, as I'm building this:
https://www.zotero.org/groups/2486746/addressing_human_needs_amid_the_covid-19_pandemic

dlesieur · April 23, 2020

@mdover Very nice, I didn't know about that Kerko example. Regarding your inquiry, I have e-mailed you.

parkjinn · July 28, 2020

@mdover Have you been able to network with anyone in building your Kerko/Zotero database? I am also working to deploy kerkoapp for my this on an organization server, and I am a bit lost on how to move forward.
https://www.zotero.org/groups/2527398/mtmwpedagogy/library
@dlesieur I'd love your pointers on the deployment of kerkoapp! I've run it through my local machine, and it works so perfectly for our organization's purpose, allowing browsing by tags instead of collections, with a very user-friendly interface.

bjohas · July 29, 2020

@parkjinn and @mdover - would there be interest in a kerko webinar or similar? @dlesiur has been building https://docs.edtechhub.org for us, and we'd love to share some of the stories etc. (As part of a possible https://forums.zotero.org/discussion/84358/zotero-developer-un-conference-or-perhaps-a-zotero-un-conference#latest or otherwise.)

Btw. we're running this in conjunction with our main site, which is on WordPress.

parkjinn · July 29, 2020

@bjohas I'd certainly be interested! I was able to deploy kerkoapp on heroku for the time being, but I'd like to integrate it with a Joomla site and/or customize the app a bit more.