Bibliographic and metadata visualization and problems with ris import from ProQuest databases

rpeter2012 · February 21, 2017

Hi,

I wonder if you happen to know any application / add-on that can visualise the bibliographic and metadata of numerous items of a Zotero folder in the way that JSTOR Data for Research does. See http://about.jstor.org/service/data-for-research. Are you aware of any other similar application which enables you to upload your own bibliographic and metadata data for statistical analysis and visualisation as carried out by JSTOR DfR? If such a digital tool does not exist, the Institute of Informatics of my university would help me create such a web-based application. A primitive Excel workbook was created 7 years ago for such purposes but this is not really ideal. See http://www.jstor.org/stable/43487818 and http://epa.oszk.hu/00800/00861/00067/pdf/EPA00861_aetas_2015_1_005-030.pdf

I downloaded the bibliographic details of thousands of periodical and newspaper articles from the ProQuest British Periodicals database in ris format. Unfortunately, during the import to Zotero several fields disappear. Here is an example:

TY - NEWS
TI - Untitled item
AN - 5512948
AB - the last Campaign in Hungary, where he was employ'd in several Expeditions, is just arrived there, to be examined, as 'tis said, on several Articles relating to the Count de Seckendorff, whose Affairs, as they report at present, are not in so advantagious a Situation as was imagined; and one may guess so, by a Writ which was some Day ago delivered to the Countess his Spouse, by which his Imperial Majesty
JF - Daily gazetteer
Y1 - 1738 Mar 29
PY - 1738
SP - [2218]
EP - [2219]
CY - London, United Kingdom, London
IS - 853
SN - 2043-2003
KW - Great Britain - Politics and government - 1727-1760
KW - GREAT BRITAIN - Politics and Government - 1727-1760
UR - http://search.proquest.com/docview/5512948?accountid=13828
LA - English
DB - British Periodicals
N1 - Last updated - 2010-07-20
N1 - DOI - bpe799-1738-000-53-000372; e799-1738-000-53-000372; 2043-2003
ER -

For example, the entries of TY, AB and Y1 are all lost during the import. How can I add them? Is it possible to write a code that would solve such import problems? Should a conversion programme import the details of such problematic ris files to Zotero sqlite database directly?
https://www.zotero.org/support/dev/client_coding/direct_sqlite_database_access
The Jebref import seems slightly better since it imports the abstracts at least.

Do you know of other reference manager software which can cope with such ris files or can be easily modified by users to do so? Endnote or Mendeley? Refworks did a great job but my university stopped subscribing to it.

Any help would be highly appreciated!

Róbert Péter

adamsmith · February 21, 2017

Zotero imports that RIS fine for me: it imports it as a newspaper (TY), adds an abstract (AB) and a full date (Y1). How exactly are you importing this?

http://papermachines.org/ was designed for metadata visualization, but it isn't, unfortunately, being actively maintained.

rpeter2012 · February 21, 2017

Hi Adam,

Many thanks for your swift response and the good news about the import! I simply used the Zotero Standalone import function. I made a mistake by copying the right text to my message but choosing the wrong file on my computer to be imported to Zotero. Now it works perfectly well. Though I was sad to see that ProQuest British Periodicals did not seem to save the different types of articles (e.g. Review, Advertisement etc) in the ris file, which you could see on the online surface. I saved numerous ris files related to particular topics in 2013 - I have no access to ProQuest now.

Thanks a lot for drawing my attention to Papermachines.org . I'll try it. Can you do the same types of analysis with this that you can do with JSTOR Data for Research?
For the first sight it does not seem to be as straightforward as JSTOR DfR as it was not designed for the same purpose. Do you see any point in improving Papermachines.org in that direction, if it is possible at all by outsiders not directly involved in the project? Do you know of any other similar regularly maintained plugin or web application that can visualise bibliographic and metadata as it is done by JSTOR DfR?

Thanks a lot!
Robert

adamsmith · February 21, 2017

Yes, papermachines does very similar things (topic modelling, ngrams) as JSTOR DFR.
It's open source and I do think it'd be a cool place to start, though I'm not sure if, starting today, I'd build something as an add-on (rather than as a separate tool in an analysis software like python or R) if I were to do it now.

sdspieg · April 2, 2017

It's important to realize, however, that JSTOR DFR is quite different from papermachines.
The former just exports 'bags of n-grams' from a query on JSTOR, without any reference to the order in which the appear. That makes it it a lot less interesting from my point of view. And sure, you can do some bean-counting with it, but that's about it.
Papermachines also has(/d?) various implementations of topic modelling in it (like LDA), which is a bit smarter than just counting how many time an n-gram occurs in a corpus.
For me personally, the next step is the Interactive Text Mining Suite (ITMS) from Olga Scrivner and her team, where we're also integrating proximity. So you can, for instance, identify two terms that you're interested in and define a primary search window (e.g. we're looking for any documents that have the terms "security policy" and "ethics" within 10 words of each other) and a secondary search window (i.e. we want to include one sentence to the left of that primary window and one to the right). The windows are user-defined. And so the system them selects all excerpts that correspond to these two criteria, and performs topic modelling (with a few different algorithms) on only those excerpts. So, for instance, when you have some larger documents in your corpus, DFR and Papermachines will analyze the entire document. ITMS will just analyze the sections that are relevant.
As Sebastian says, ITMS is based on R and shiny apps.
The next generation of tools, I suspect, will be based on deep learning algorithms - like Google's Tensorflow, which is also available as an R package. [And BTW - Olga is also working that into ITMS].
I must confess that I am surprised about how few academics seem to be pushing for this... But at least we are making some progress!