API - access to attachments
We now have two developers that are interested in hooking up existing and new textmining tools to Zotero libraries. But we've been stuck for a few months now on how to get access to the 'full' libraries - both metadata AND 'content' (pdfs, html-files, .zotero-ft-cache, etc.)
Here's what they've tried:
- working with local files (on our hard drives - and synced between them)
- manually exported libraries as rdf (with the underlying folders), and then importing them in the textmining tool - here the problem was that we couldn't get the .zotero-ft-cache files;
- looked into zotsite, which seemed promising for its ability to access zotero sql files directly - we were able to access the sqlite database and export clean metadata and links to pdf files, but we were not able to see how zotero creates text files (or where the .zotero-ft-cache files went)
- Tested the voyant-zotero plugin, which creates a zip with the necessary content files - but it does not include the metadata and the attachment and bin files appear hard to parse
- through the API
- we were only able to access the metadata form users/groups, not the content. So ideally, we'd want to be able to pull the entire libraries into the textmining tools (contet + metadata)
If anybody could help us in solving this problem, we'd greatly appreciate it. Both of these efforts (ITMS and Voyant) are open-source. Both of them would offer all Zotero users some pretty amazing new opportunities for 'high-level' explorations of their collections. All of this seems to me to be the natural next step after Zotero 5 has so dramatically improved our ability to manage larger libraries AND to find and download the underlying PDFs (unpaywall integration). But this is now the main stumbling block. Neither developer is very familiar with Zotero. Neither has this very high on their own develpment agenda. But both of them have declared their willingness to integrate this once we can 'crack' this problem.
Some of you have suggested already that the API is the way to go. I have the gnawing suspicion that this is actually not that hard to do achieve if they'd just know how to get access to the full group metadata AND content on the Zotero server. So any pointers in the right direction by any zotero developer would be truly welcome here. Thanks much!
GET <userOrGroupPrefix>/items/<itemKey>/fulltext
to get an item's indexed full text in JSON format.Similarly accessing an attached file is
GET /users/<userID>/items/<itemKey>/file
where the itemKey is for the attachment (not the top level item, since that can have multiple attachments) and can be found by looking at the top level item.
Edit: Markdown, markdown, a kingdom for forum markdown!
its /groups/522920 (or whatever other number you have there)
It's referred to as "prefix" so the documentation doesn't always have to distinguish between user (which is /users/userID and group, which are /groups/groupID)