Analyzing Zotero collections with Voyant
Hi all! Some of you may know me as the former dev of Paper Machines, which once upon a time let you visualize your Zotero collections. Regrettably I've not been able to maintain it, but I wanted to let folks know about an alternative for those still interested in text analysis.
I've put together a little extension that can export a full-text Zotero collection to Voyant (https://github.com/corajr/zotero-voyant-export/). Voyant seems well-suited to provide the rich, multifaceted data display I always dreamed of for Paper Machines — it even has topic modeling these days! Hopefully it will provide a better experience for that use case.
I've only tried this extension with the 5.0 beta, on Linux and Mac; it's a bit rough-and-ready but seems to work. Please let me know, here or via Github issues, if you have any thoughts on how to make it better or more useful for your specific case.
I've put together a little extension that can export a full-text Zotero collection to Voyant (https://github.com/corajr/zotero-voyant-export/). Voyant seems well-suited to provide the rich, multifaceted data display I always dreamed of for Paper Machines — it even has topic modeling these days! Hopefully it will provide a better experience for that use case.
I've only tried this extension with the 5.0 beta, on Linux and Mac; it's a bit rough-and-ready but seems to work. Please let me know, here or via Github issues, if you have any thoughts on how to make it better or more useful for your specific case.
The local server version imports much faster, then says 'uploading' for a while, and then goes back to the initial web page instead of in the actual program.
Does anybody know whether there's a log created somewhere?
Voyant's local server should display a log screen when launched; the relevant entry for import should show up as "trombone: TOOL: corpus.CorpusCreator." Please look to see if there's any errors there, and possibly try bumping up the memory allotment to 2048 or 4096 MB.
If there are no errors on Voyant's side, you could enable Zotero's debug log under Preferences -> General and retry the export; the extension will output there as it processes items. Any relevant issues on the Zotero side would likely be interspersed between lines containing "doExport."
(BTW, the v0.0.1 add-on was missing the update URL; if you remove it and install v0.0.4 from https://github.com/corajr/zotero-voyant-export/releases/tag/v0.0.4 future updates should happen automatically.)
Apr 10, 2017 10:32:27 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+55 (55) in font DejaVuSansBold
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2017-04-10 22:32:34.174:WARN:/:qtp989110044-14: trombone: ERROR: An error occurred during multi-threaded document expansion.
java.lang.IllegalStateException: An error occurred during multi-threaded document expansion.
at java.lang.Thread.run(Unknown Source)
Caused by:
java.util.concurrent.ExecutionException: java.io.IOException: Unable to parse document: UNKNOWN: null
We are working on a quite large collection of academic journal articles (ca 10k items). As I had a previous experience with Paper Machines I thought that would be a good tool for us, as we are also interested in the Topic Modelling by time. I checked this forum by November 2016 and didn't find any info Paper Machines were off.
Then, last month, finally working on the data, we started struggling with Paper Machine blank pages (or absences of feedback). In my previous work with it, the collection was smaller and I was working on a Windows machine, by then (2012) I did not have any problem. Now, the collection is bigger, as I have to use other packages for Mac OS X, I am using a standalone Zotero on my Mac and have installed Paper Machines as an add-on. Unfortunately, so far, even splitting the collection in smaller sub collections, the extraction to Paper Machines works, but even after running it for more than 24 hours, I still get a blank page as result (empty html) or the message "No log file found."
I pretty much liked the layout of the topic modelling via Paper Machines using Mallet. I'm afraid result in Volant are not as helpful as those in Paper Machines for those interested to look at time... As the topic modelling by time was in the add-on to Zotero.
I will try to export the full-text Zotero collection to Voyant. I have already tried to run part of my data (865 articles; ca 2GB) on it (inserted manually, Zotero PDF collections converted into plain txt), but I also got the "Error (Document terms)", as @sdspieg did.
Would be interested to know whether @DWL-SDCA got the same problem when trying to run more data than in the first test mentioned.
And, still, by curiosity, would that be any possible "revival" of Paper Machines, as a sort of a way back machine for tools?
[An added problem is that the items are drawn from scholarly journal articles from more than 30 distinct professions and published in 150+ nations. Many of these professions use different terms to label very similar concepts. While, during the editimg process, we add terms that make an abstract understandable to persons not in the authors' profession, we add the terms as "explainers" and keep the authors' own words. Also there is the problem of the same word being used for very different things. Take, for example, the word "football". Is the article about soccer, American football, Gaelic football, Australian-rules football, touch/flag football, or any of the other games called football or translated to football? Each of these games has very different rules and equipment and injury risks. When adding an article we identify which game is being discussed and add the term the author didn't think to mention {few European readers of an Italian journal article could be expected to think an article about football concerned anything other than soccer or that an article about "football" in an American journal by an author from the U.S. would concern soccer}. We do this term-explanation not only for search/query purposes but so that someone browsing and reading can quickly identify what the article is about. These administrative edits and explanations make interpretations of text analysis more troublesome than useful.]