QDA and Zotero

sdspieg · May 6, 2014

Let me try this again as a new thread. I am looking for a way to make annotations IN attachments in Zotero (PDFs, html, etc.) that can then subsequently be extracted in such a way that they can be cross-tabulated and analyzed.

Let me elaborate. Imagine you have a corpus in the form of a Zotero collection. And you want to analyze how certain topics/themes/ideas/etc. have changed over time. Or how they are different across subsets of the corpus - e.g. how are these topics/themes/ideas/ different in this country/discipline/school of thought/... The idea would then be to develop some coding scheme and to start coding the actual content of the items in your corpus. But ultimately, you'd want to be able to cross-tabulate the data.

Does anybody have an idea about how I could do this? I.e. either WITHIN Zotero (through a ), or maybe even in such a way that we could do teh coding in Zotero and then extract the coding results from sqlite?

Any ideas would be greatly appreciated.

aurimas · May 6, 2014

I am looking for a way to make annotations IN attachments in Zotero (PDFs, html, etc.) that can then subsequently be extracted

For PDFs, you can just add regular comments/highlighting and use ZotFile to extract them into Zotero notes. It sounds to me like you would probably benefit more from tagging items, rather than making notes about them, but it's not possible to annotate items inside the PDF and extract annotations as tags.

http://chrisjr.github.io/papermachines/ might have some interesting analysis methods, but I think it would do best with tags, not notes (I could be wrong). Other than PaperMachines, I'm not aware of any categorical data analysis plugins for Zotero, which means that you would want to export your annotations to some other software. There would probably be ways to do this if we knew what format the other software can read.

If you give some more concrete examples of what kind of annotations you want to make in the PDFs and what kind of output you expect (or what analysis software you intend to use), we can probably assist you more.

sdspieg · May 29, 2014

Thanks. I thought I had tried to give you concrete examples of the kinds of annotations we want to make. It's essentially like what Nvivo and other similar software pograms do. You do not tag the items, but the precise passages in the items based on some coding (or if you prefer, mark-up) scheme. So, for instance, we are currently working with a corpus of about 2000 articles, webpages, etc. that talk about policy options for dealing with the 'new' Russia. We want to mark up all options that are of a political, economic, societal, informational, etc. nature. That is the highest level of our coding scheme. Within each of these high-level categories, we have more detailed ones. So one of the economic options is 'sanctions' which in turn is divided into 'personal', 'sectoral', 'regional', 'national', etc.
And so yes, we could add those as tags in Zotero, but then you miss the 'drill-down' option afterwards. The nice thing about a program like Nvivo (which I hate by the way, but I'm just giving it as a well-known example) is that once you've marked up the articles, you get the stats AND you can drill down to the precise passages in the text that have been marked up with whatever code you're interested in.
And mind you - it'd be great to be able to do this WITHIN Zotero. But even if that is not possible (as I suspect), I was hoping that somebody might know of a way to work with the underlying sqlite dbase (with another software program) outside of Zotero, in such a way that we could mark up the attachments etc. without 'breaking' the Zotero structure.

sdspieg · June 12, 2014

Still struggling with this... Let me ask another question - how can we get the data coming out of the tags in a csv-format that we can the analyze? E.g. in January there were that many tags 'Y', N tags 'Z' etc.; in February that many etc...

JonEP · June 20, 2014

This is something I wouldalso love to see. Here's a link to a post that provides a series of links to threads about this topic: https://forums.zotero.org/discussion/7210/tagging-text-within-a-pdf/#Item_1

Qiqqa seems to be the pdf organizer/reference manager that is the closest to allowing this (by allowing tagging of pdf annotations). Their business model doesn't appeal, however, as they lock you into using their cloud storage system, which is expensive.

JonEP · June 27, 2014

A related QDA sort of issue is discussed here: https://forums.zotero.org/discussion/79/hierarchical-tags/#Item_36

sdspieg · September 16, 2014

JonEP - have you actually used Qiqqa for this purpose? If so, what were youre experiences?

JonEP · October 25, 2014

sdspieg, No I haven't used Qiqqa.

zurpher · September 22, 2015

Qiqqa seems to be the pdf organizer/reference manager that is the closest to allowing this (by allowing tagging of pdf annotations). Their business model doesn't appeal, however, as they lock you into using their cloud storage system, which is expensive.

I think it is also not possible to export PDF annotations from Qiqqa (and Mendely) as far as I understand it. I therefore stick with PDF-Xchange Editor to be able to make annotations that are embedded within the PDF itself.

@sdspieg

Any success with your work? Have you checked whether Atlas.ti, Citavi or maxQDA could be of any help?

sdspieg · September 22, 2015

Success? Well, we did of course manage to get the job done. But only outside of Zotero - in Dedoose. And so you end up with two separate tools - one for the coding. One for the library management/citations. It's unfortunate, but I suggested here to work some simple attachment 'coding' functionality into Zotero (or, if need be, even directly in the underlying sqlite database), and nobody has picked it up. I also suggested to Eli Bieber from Dedoose to allow for the import of the Zotero sqlite database into Dedoose, but he also didn't 'bite'. So I don't think this is possible - yet.

I'm sure it will come of course, it's just way too obvious :) We can even do automated coding with papermachine, so why shouldn't we be able to do hand-coding?

And if you're interested in my longer-term view, I think more and more these 'middle-men' (like Zotero or Dedoose) will be 'cut out' and will be fully appified. People will be able to 'annotate' text in sthg like Diigo (which, incidentally also allows this for pdfs).These annotations will also get a URI, just like all DOIs and that's what will be 'quoted' - probably not in a footnote, but just as a hyperlink. Think of 'active citation', for instance. At some point, all of this human hand-coding will then be put into some deep learning system and from then on all of this will be done algorithmically. But hey that's just my layman perspective on the future of 'knowledge sourcing' :)

zurpher · September 22, 2015

Well, good to know you got the work done. I agree with you that it would be useful to have an integrated tool for coding and referencing. Zotero probably doesn't have sufficient resources to add such features.

I once tried to code PDFs in MAXQDA but did not feel this would get me anywhere. Qiqqa's annotation snippets are nice but have their limitations too.

I personally cling on to the idea that human sense-making is not substitutable to mechanic algorithms. Deep learning sounds to me like too much science fiction but who knows.

sdspieg · July 7, 2016

Any progress on this? Does anybody know, for instance, whether diigo allows users to annotate offline pdfs (in, for example, a Zotero library)? Or any other solutions?

sdspieg · December 12, 2017

I never got an answer to this it seems - but I remain keenly interested in whether/how it would be possible to do (preferably even social) online annotation of downloaded pdf's... So any info would be greatly appreciated

adamsmith · December 12, 2017

have you looked at hypothes.is? Doesn't yet integrate into Zotero, but unbeatable for social annotation.

sdspieg · December 12, 2017

I had looked at it when you pointed it out to me in 2016. I guess I didn't quite catch on to it then. I've been using diigo relatively intensively to annotate of what I have been reading read (professionally) on the Internet since 2015. And I find that useful (for me), although I haven't been able to persuade my colleagues to go 'social' (at least within our team) with it. But so I've re-activated the Chrome extension just now again, and will take another look. And I guess the main difference is that annotation are social/public by default?
Also you said it doesn't YET integrate into Zotero - does that mean somebody is thinking about this for the future? Because that certainly WOULD make a difference for me. I also like the idea that this might push us a bit closer to the ideal of "Wissenanhäufung"... Although I'd still have to see how 'self-structuring' the annotations are. Because it is nice to see (as we also do on our Kindles) which passages of text are highlighted by the 'wisdom of the crowd', but that is still a far cry from a (self-)structured debate about those passages. Plus I also don't quite see yet, how we can overcome the copyright hurdles for socially annotating articles from academic journals or books.
Still - thanks for the tip, Sebastian!

realtime99 · December 12, 2017

Hi, I am looking at this exact issue right now as I am getting into a QDA/discourse analysis project (a new area for me).

Citavi is a program that may be closer to what you want. It is basically like Zotero, with a browser plug-in and a Word add-in, but with the added feature of a more fully-functional "knowledge item" manager for quotations and comments linked to specific text passages in pdfs. Minuses are: Windows-only, not free or open-source, and I find their workflow for managing citations slower than Zotero's. Plusses are: comes with unlimited cloud storage, great customer support, and a built-in knowledge manager for sorting/comparing quotes.

I was hesitant to switch from my current workflow, however, because of the sunk cost of learning a new program, and Citavi has some quirks that bothered me (although these are UI issues that wouldn't bother everyone).

I do have access to NVivo-- can you tell me why you don't like that program? Is it just that you would rather use only one program where possible?

Also, I'm interested in knowing what you are unable to do in Zotero using notes, tags, and Zotfile. I'm imagining a workflow like this:

- Enter reference into Zotero and attach PDF
- annotate pdf in Acrobat with highlights and commenting
- use Zotfile to automatically put quotations and comments in an attached note

The next steps would depend partly on how many quotations and comments you would have per file. If not many, you can just tag the note with any additional keywords for later searching. If you have a lot, you could break the note into several smaller notes with individual tags.

One other thing you may not realize is that you can have standalone notes in Zotero that you link to a reference using the "related" field. See: https://www.chronicle.com/blogs/profhacker/taking-better-notes-in-zotero/36561

@sdspieg, I'm very interested in hearing your feedback given your experience with actually dealing with the sort of data.

adamsmith · December 12, 2017

Public by default, but very nice private group function, so doesn't have to be. The folks at Hypothesis are thinking a lot about how to surface the most meaningful & high-quality content. My sense is that it's likely not going to be full-on "wisdom of the crowd" but selected and/or curated groups.

The tagging feature of annotations is really nice, as is the ability to link to them.

Hypothesis is working with a lot of publishers on the annotation/paywall question. I think that can be done (within the limits set by the system -- you still won't be able to see articles you don't have access to.

And yes, I've talked to one of their devs and we think a plugin that brings in Hypothesis annotations into a Zotero note is super doable -- but I'm not going make any promises of time.

For full disclosure, Hypothesis is on a grant with my group at SU:
https://qdr.syr.edu/qdr-blog/qualitative-data-repository-teams-hypothesis-develop-annotation-transparent-inquiry-ati

sdspieg · December 12, 2017

Thanks @adamsmith . I'm definitely interested. I've always been annoyed at the lack of transparency in qualitative research. Or should I say embarrassed about my own work.

The sloppy (and, of course, fully intransparent, nature of the 'literature review' genre across all disciplines. I always prided myself in reading more (or better) than most others who were working on these issues. But how do I know whether what I read is even a remotely representative sample of the literature on a topic? And how could I even know? [This is what sparked my interest in corpus analytics and textmining tools - because we now CAN download pretty much all academic articles based on an intelligent search query, apply things like topic modelling or n-gram co-occurrence on it, and get at least an idea about the main topics - and how they change over time, or differ across subsets of the literature. It still baffles me that not more academics on this forum seem to be sharing that interest/curiosity].

But then there is also the matter of the transparency of citations: our current continued inability to match citations with the actual piece of text they are referring to. Andy Moravcsik's work on 'active citations' further picqued that interest, but I always thought we should be able to go further: to click-through from a footnote NOT to a page (that's a best case - and even then you still have to 'guess' what part of that page the author is referring to) to the actual marked text. Here again - we used to have no alternative, but now we certainly do.

And finally, the lack of transparency in our analysis - what precise inferences do we draw from whatever pieces of text we select from whatever subset ot relevant literature we choose to read and analyse. And yes - that is partially what your 'Annotation for Transparent Inquiry' effort with hypothes.is seems to be about. But is that enough? Shouldn't the reader who goes to the trouble to dig down a cite also be able to look at the author's comments on/analysis of other parts of the artoicle/book/... that she decided NOT to quote?

I guess there things will start changing soon. At an academically glacial pace, I reckon. In the meanwhile, we'll just keep experimenting with various ways to hook up Zotero libraries to various textmining tools (as with ITMS - on which progress is also slower than we had anticipated)

PS - in your proposal, you IMO understate the problems in quantitative approaches as well: very few people really appreciate (or just decide to ignore) the many many caveats that come with ALL datasets.

sdspieg · December 12, 2017

@quickfold11

Citavi is a program that may be closer to what you want.

Thanks - I'll check it out.

I do have access to NVivo-- can you tell me why you don't like that program? Is it just that you would rather use only one program where possible?

No. In fact we do use Dedoose for manual coding. I find NVivo too 'Microsoft Office'-like; whereas Dedoose is more 'Googley'. The main things is that we work with sometimes quite large research teams (>20), and so we don't want to have to deal with version control issues etc. But so if you do a search on Dedoose here, you should find more info on the whys and hows.

Also, I'm interested in knowing what you are unable to do in Zotero using notes, tags, and Zotfile. I'm imagining a workflow like this:

- Enter reference into Zotero and attach PDF
- annotate pdf in Acrobat with highlights and commenting
- use Zotfile to automatically put quotations and comments in an attached note

And how do you do that? I.e. get the highlighted/marked text excerpts automatically from Acrobat into the notes? I should maybe take another look at Zotfile, which I do find great for other functionalities. But again - it's the real-time collaborative aspect that we'd miss. We are typically talking about 100s of documents, very richly marked up with sometimes 100s of (nested) codes. And this workflow would also leave out the usual QDA-stuff: developing coding schemes, easily applying codes to excerpts, color codes, 'seeing' the codes vertically to the right of the text excerpts etc.

One other thing you may not realize is that you can have standalone notes in Zotero that you link to a reference using the "related" field.

But that's just attaching notes to items, no? And not to specific excerpts within the html/pdf/...?

mangen · December 11, 2018

Any updates on a workflow that would bring Hypothes.is annotations into Zotero? I would be very interested in any suggestions or ideas anyone has...

sdspieg · December 11, 2018

Me too :)