API - access to attachments

sdspieg · September 2, 2018

We now have two developers that are interested in hooking up existing and new textmining tools to Zotero libraries. But we've been stuck for a few months now on how to get access to the 'full' libraries - both metadata AND 'content' (pdfs, html-files, .zotero-ft-cache, etc.)

Here's what they've tried:

working with local files (on our hard drives - and synced between them)
- manually exported libraries as rdf (with the underlying folders), and then importing them in the textmining tool - here the problem was that we couldn't get the .zotero-ft-cache files;
- looked into zotsite, which seemed promising for its ability to access zotero sql files directly - we were able to access the sqlite database and export clean metadata and links to pdf files, but we were not able to see how zotero creates text files (or where the .zotero-ft-cache files went)
- Tested the voyant-zotero plugin, which creates a zip with the necessary content files - but it does not include the metadata and the attachment and bin files appear hard to parse

through the API
- we were only able to access the metadata form users/groups, not the content. So ideally, we'd want to be able to pull the entire libraries into the textmining tools (contet + metadata)

If anybody could help us in solving this problem, we'd greatly appreciate it. Both of these efforts (ITMS and Voyant) are open-source. Both of them would offer all Zotero users some pretty amazing new opportunities for 'high-level' explorations of their collections. All of this seems to me to be the natural next step after Zotero 5 has so dramatically improved our ability to manage larger libraries AND to find and download the underlying PDFs (unpaywall integration). But this is now the main stumbling block. Neither developer is very familiar with Zotero. Neither has this very high on their own develpment agenda. But both of them have declared their willingness to integrate this once we can 'crack' this problem.

Some of you have suggested already that the API is the way to go. I have the gnawing suspicion that this is actually not that hard to do achieve if they'd just know how to get access to the full group metadata AND content on the Zotero server. So any pointers in the right direction by any zotero developer would be truly welcome here. Thanks much!

adamsmith · September 3, 2018

Getting the indexed full-text content of an item via API is indeed trivially easy; it's documented here: https://www.zotero.org/support/dev/web_api/v3/fulltext_content i.e.
GET <userOrGroupPrefix>/items/<itemKey>/fulltext to get an item's indexed full text in JSON format.

Similarly accessing an attached file is
GET /users/<userID>/items/<itemKey>/file
where the itemKey is for the attachment (not the top level item, since that can have multiple attachments) and can be found by looking at the top level item.

Edit: Markdown, markdown, a kingdom for forum markdown!

sdspieg · September 3, 2018

Thanks much! We hadn't seen that (our bad).

Edit: Markdown, markdown, a kingdom for forum markdown!

Hear hear!!

emilianoeheyns · September 4, 2018

While I know it's thread pollution (maybe we should open a separate thread for this), but

Edit: Markdown, markdown, a kingdom for forum markdown!

yes please!

sdspieg · September 12, 2018

One more question (and I'm not sure where to post that, but I'll try here first) - The group prefix is just the integer that we see when we look at them online, right? E.g. https://www.zotero.org/groups/522920?

adamsmith · September 12, 2018

Explained here: https://www.zotero.org/support/dev/web_api/v3/basics#user_and_group_library_urls
its /groups/522920 (or whatever other number you have there)

sdspieg · September 12, 2018

So GroupPrefix is the same as GroupID?

adamsmith · September 12, 2018

no, it's /groups/<groupID>
It's referred to as "prefix" so the documentation doesn't always have to distinguish between user (which is /users/userID and group, which are /groups/groupID)

sdspieg · September 12, 2018

I see. Thanks!

hamblha · November 5, 2018

I have been trying to get the voyant plugin to work for a few months for this very purpose (I'm a librarian trying to host a workshop on this possibility!), and I would _love_ to know if you've been able to get the full text + metadata content extracted for either of the text mining tools. If you're willing to share, I'd love to know!

bjohas · June 27, 2020

In case it's of interest, we've been building this: https://github.com/edtechhub/zotero-cli The attachments function will improve over the course of the next week.