Paper Machines - Topic Modeling

I have tried to run this 'topic modeling' routine (under Windows) on a few sets of articles now, but so far without success.
The idea of the plugin is simple: it uses the Mallet textmining suite's topic modeler (http://mallet.cs.umass.edu/topics.php) to identify the most important 'themes' (topics) across the selected text: terms that 'travel together' through the text and can therefore be said to be a 'topic'. THis is of course fantastic, as it allows you to immediately and automatically identify the key themes in any (even large) set of documents - thus allowing for faster (human) analysis as we can then zoom in on these topics (what's in them, what are the key relationships within the themes, how do they change over time, etc.). The problem is that I don't seem to be able to get it to work. Which is strange, because the other parts of this great plugin work just perfectly.
Can anybody share how long the whole process should last? I first tried it on 150 articles, and it ran for a few hours without ever generating anything. I presume it does a number of iterations of all articles to find the best 'fit', but I still doubt this should take more than a few minutes (if that). I am now trying with another set of 45 articles from Zotero Standalone with the latest version (0.1.9). And once again, the routine starts. It opens a big new blank window with a small grey progress bar on the top left above which it says: "Topic Modelling: RTOs" (the latter being the name of the collection of articles. There is a green part that moves from the left to the right across that grey bar. Sometimes the caption disappears, only to appear again after a while. And this just goes on and on. Can anybody provide any assistance? Thanks!
OH and also Chris - is there any way to provide any more parameters (as, for instance, the number of topics to be extracted, the number of iterations, etc.)?
Thanks!

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.