Harnessing Zotero's metadata reader for research

martonaronvarga · October 9, 2023

Hi,

I would like to extract article metadata from possibly as many journals as available. I've sensed that Zotero extracts this data seamlessly from the web via the web connector plugin. As Zotero is open-source, i think about harnessing this power programatically but I am clueless how to start as I am not familiar with the codebase of Zotero. If using Zotero for this is an overkill, could you please recommend tools for scraping article metadata from journals that is fast and legal? I've looked up some APIs but many are either not for free or doesn't contain the data we need. Any help is appreciated!

Marton

adamsmith · October 9, 2023

The way Zotero extracts metadata isn't designed to work particularly well for such a massive scraping exercise. Because it works through the browser and so loads basically every single webpage, it's comparatively slow and also will likely get you locked out from a number of pages when used automatically and at scale.

I would take a look at OpenAlex as one very good source of journal article metadata -- I believe the entire corpus can be downloaded for free, or you can use their API for free for more limited usage.

martonaronvarga · October 9, 2023

Thank you!