API - access to attachments

We now have two developers that are interested in hooking up existing and new textmining tools to Zotero libraries. But we've been stuck for a few months now on how to get access to the 'full' libraries - both metadata AND 'content' (pdfs, html-files, .zotero-ft-cache, etc.)

Here's what they've tried:


  • working with local files (on our hard drives - and synced between them)

    • manually exported libraries as rdf (with the underlying folders), and then importing them in the textmining tool - here the problem was that we couldn't get the .zotero-ft-cache files;

    • looked into zotsite, which seemed promising for its ability to access zotero sql files directly - we were able to access the sqlite database and export clean metadata and links to pdf files, but we were not able to see how zotero creates text files (or where the .zotero-ft-cache files went) 

    • Tested the voyant-zotero plugin, which creates a zip with the necessary content files - but it does not include the metadata and the attachment and bin files appear hard to parse


  • through the API

    • we were only able to access the metadata form users/groups, not the content. So ideally, we'd want to be able to pull the entire libraries into the textmining tools (contet + metadata) 


If anybody could help us in solving this problem, we'd greatly appreciate it. Both of these efforts (ITMS and Voyant) are open-source. Both of them would offer all Zotero users some pretty amazing new opportunities for 'high-level' explorations of their collections. All of this seems to me to be the natural next step after Zotero 5 has so dramatically improved our ability to manage larger libraries AND to find and download the underlying PDFs (unpaywall integration). But this is now the main stumbling block. Neither developer is very familiar with Zotero. Neither has this very high on their own develpment agenda. But both of them have declared their willingness to integrate this once we can 'crack' this problem. 

Some of you have suggested already that the API is the way to go. I have the gnawing suspicion that this is actually not that hard to do achieve if they'd just know how to get access to the full group metadata AND content on the Zotero server. So any pointers in the right direction by any zotero developer would be truly welcome here. Thanks much!

  • edited September 3, 2018
    Getting the indexed full-text content of an item via API is indeed trivially easy; it's documented here: https://www.zotero.org/support/dev/web_api/v3/fulltext_content i.e.
    GET <userOrGroupPrefix>/items/<itemKey>/fulltext to get an item's indexed full text in JSON format.

    Similarly accessing an attached file is
    GET /users/<userID>/items/<itemKey>/file
    where the itemKey is for the attachment (not the top level item, since that can have multiple attachments) and can be found by looking at the top level item.

    Edit: Markdown, markdown, a kingdom for forum markdown!
  • Thanks much! We hadn't seen that (our bad).

    Edit: Markdown, markdown, a kingdom for forum markdown!

    Hear hear!!
  • While I know it's thread pollution (maybe we should open a separate thread for this), but
    Edit: Markdown, markdown, a kingdom for forum markdown!
    yes please!
  • edited September 12, 2018
    One more question (and I'm not sure where to post that, but I'll try here first) - The group prefix is just the integer that we see when we look at them online, right? E.g. https://www.zotero.org/groups/522920?
  • Explained here: https://www.zotero.org/support/dev/web_api/v3/basics#user_and_group_library_urls
    its /groups/522920 (or whatever other number you have there)
  • So GroupPrefix is the same as GroupID?
  • no, it's /groups/<groupID>
    It's referred to as "prefix" so the documentation doesn't always have to distinguish between user (which is /users/userID and group, which are /groups/groupID)
  • I see. Thanks!
  • I have been trying to get the voyant plugin to work for a few months for this very purpose (I'm a librarian trying to host a workshop on this possibility!), and I would _love_ to know if you've been able to get the full text + metadata content extracted for either of the text mining tools. If you're willing to share, I'd love to know!
Sign In or Register to comment.