Zotero and Citation Analysis

adamgolding · March 19, 2012

I once thought that Zotero was the cornerstone of research software, but I have recently re-evaluated all my research practices, and find it very wanting in what are now very key areas of what strike me as a good way to conduct research: citation analysis and other forms of bibliometrics. I am currently evaluating various open-source projects to see which can be repurposed to achieve the sort of research environment desire, and I'm wondering how much enthusiasm there is for the sort of things I desire in the Zotero community--some background:

I was motivated to write this thread when I revisted some old posts about how the 'google scholar citations' tool for zotero is no longer maintained and doesn't work any more--I used to see this tool as evidence that Zotero was headed in the right direction, but am saddened to see that it was not the beginning of a longer narrative--similar dismay applies to the SEASR and the VUE extensions--the latter's not maintained and if the former is, their documentation certainly isn't and I can't tell if it's broken or confusing :-)

While zotero used to be more or less a self-contained universe within which I could aggregate and sort out all the information gathering aspects of my research, I now use many tools to do things that zotero can't:

- RETRIEVAL OF AND SORTING BY CITATION STATISTICS:
I use Publish or Perish to sort big Google Scholar queries (up to 1000 entires) by # of GS citations, citations per year, and GS rank. I sometimes export many of these queries to csv so I can aggregate them to do the same in excel. I do similar sorting using HistCite for WoK queries, with the added functionality that it can also examine 'local' citation counts--i.e. the number of times an item has been cited by other items in a given set. I am currently investigating using Leydesdorff's command-line tools (along with DosBOX--they are 16-bit binaries) to convert Scopus search results into the ISI format so I can visualize them in Scopus as well.

- OTHER STATISTICS TO SORT BY
Unfortunately there is no easy way to sort books by their amazon sales ranks in Zotero, nor is there an easy way to sort variosu search terms by the number of results they generate on GS, Google, or other places. Other relevant parameters to sort references by include number of pages (see a previous thread on that subject), h-index of the author, impact factor of the journal, etc. In the case of ISI citations these numbers could be retrieved automatically in some cases.

- VISUALIZATION OF CITATION NETWORKS
When it comes to 'sizing up' a large literature and developing a plan of strategic reading, HistCite can display some breathtakingly useful visualizations for a set of downloaded citations. (Note that current ISI results require "RID" to be replaced with "rid" everywhere and blank lines to be inserted between records before histcite can accept them). As above, converting scopus results to ISI format woudl allow the same for them. Some utilities purpose to work for GS but not for me--for instance, the tool at http://hublog.hubmed.org/archives/001004.html is long broken, and the commerical product at touchgraph.com does not support google scholar as far as I can tell. This tool doesn't seem to work too well right now: http://www.madhavajay.com/kalki/ I have written the author of a different tool, "Citation Network Analyzer", a tool that creates visualizations similar to HistCite's for citations on google scholar, and he tells me that he is updating the pogram to work with current google scholar output formats. His code is in R and he tells me that he is open to making it available as an open source project if there are some people to maintain it.

- OTHER VISUALIZATION METHODS
While NetworkWorkbench, and the related Sci2 tool, aren't very good for paper-paper citation graphs because they do not easily lay things out with one coordinate corresponding to publication year, and CiteSpace does not do paper-paper citation graphs at all (it appears to focus on co-citation graphs), all three are powerful open-source tools for other forms of bibliographic visualization. (Microsoft Academic Search also deserves an honorable mention here for having several fun visualizating techniques that are already available!)

- DISCOVERING 'MISSING' REFERENCES
The importance of this really 'clicked' for me when I had downloaded a set of references in histcite, examined visually how they inter-cited each other, but then proceeded to the 'Cited References' window which reveals certain other works that *many* of the references my search discovered cite certain other works which aren't retrieved based on whatever keywords I happened to have used--this tends to show me many things I didn't know I really wanted to find that are essential for my research! :-)

Now, what does this have to do with zotero? Well, short of actually implementing many of the features above, there are a lot of simple ways that it could integrated with workflows such as these, such as:

1. provide fields to store citation counts from various sources (Google Scholar, Scopus, WoS, and CiteSeerX), and make it possible to grab this figure when you grab a reference from these sources. Make sure merging references preserves the numbers from different sources. Allow sorting by these fields and by other values computed from them, such as 'citations per year'.

2. provide fields to store lists of references that cite a given reference, and other fields to store references that are *cited by* a given reference. Allow this data to be imported from files downloaded from ISI and Scopus (and maybe CiteSeerX), or to be gathered when importing directly from the website to zotero.

3. allow rate-controlled scraping of such lists of citing references from google scholar, and storing them--this would best be used 'selectively' to supplement citation data from ISI and scopus--it would be nice to click on a few 'key' works and ask zotero to fill in the list of citing references, and perhaps to download them in a new subcollection in that folder.

4. allow export of references in a way that allows HistCite to visualize the data stored in these 'cited by' and 'cites' fields. i.e. export proper ISI-style files. This woudl also allow a wide array of visualization of the database in zotero using open-source tools like NetworkWorkbench, Sci2, and CiteSpace--expoting ISI files is the quickest way to allowfor various forms of visualization.

5. In lieu of grabbing citations counts directly as in 1, an easier short-term solution is to allow these figures to be stored when importing files generated by Publish or Perish, ISI, or Scopus. Currently citation counts from Publish or Perish *are* in fact stored in the 'extra' field but I can't sort by that field, nor can I sort by derived fields like citations per year! Just that small change would be the quickes way to enable some pretty powerful citation-count sorting in zotero.

6. allow a user to ask zotero to download PDFs for a folder of references imported from soem other source--currently my workflow involves two separate steps for a GS query: 1. perform a search in Publish or Perish and save the result as a csv for sorting purposes, 2. perform the same search in my browser so I can ask zotero to download all the PDFs into a new folder. Now I have to sort the spread sheet but click in zotero when I am navigating PDFs in terms of citation counts--clearly this is just a kludge. While the small changes mentioned at the end of 5 would go a long way to helping this situation, I would still have to add PDFs to the citations manually after importing a file from Publish or Perish :-)

What is in the cards for Zotero when it comes to all this citation analysis stuff? :-)

mronkko · March 19, 2012

The main problem with citation analytics is that while these are possible to do as extensions to Zotero, there has thus far been not enough interested developers that could work on this. Because Zotero is limited in resources, it is unlikely that citation analysis tools will be provided by the Zotero team in a near future.

1) Check out this extension https://addons.mozilla.org/en-US/firefox/addon/zotero-scholar-citations/

2) Could be implemented as a plugin to Zotero. A real problem here, however, would be how to uniquely identify the cited references. If you have ever tried matching the data about cited references from ISI and Scopus, you know that it is not easy. (Or it was not easy a few years ago, at least.)

3) The problem here is again identifying the references. For example ISI exports RIS format that stores the cited references as

CR HENDERSON J, 2010, QUAL PRIM CARE, V18, P33
*FED REG COMM, 2010, CONN AM NAT BROADB P
HING E, 2010, ELECT MED RECORD USE
Kaplan B, 2009, J AM MED INFORM ASSN, V16, P291, DOI 10.1197/jamia.M2997

If the reference does not have a DOI, these data do not make any sense outside ISI. If it has a DOI, then it can be resolved without google scholar.

4) Same problem with identifying the references.

5) See response to 1

6) I do not really understand the question. You can get PDFs any way you want, then drag and drop these to Zotero, and use the retrieve metadata feature to add data about these PDFs to Zotero.

adamsmith · March 19, 2012

I understand adamgolding as saying that he know the citations plugin - the problem with 1.) is that it's not very well maintained and a bit buggy - also, it works awkwardly with Zotero as it uses the "rights" field to store citation counts.

I don't know if more is planned along these lines - it's certainly not something that gets talked about a lot - it's also not something that's requested a lot. I would guess mronkko is right that most of this would have to be done via plugin(s) - generally, since Zotero doesn't actually look at the bibliography of papers it stores - citation network analysis doesn't strike me as something that comes naturally to Zotero, so as an honest answer to your question about what's in the cards, I think the answer is "not that much, unless a third party developer wants to invest hundreds, more likely thousands of hours in that".

The one thing that I would be helpful for this to have in Zotero proper is to have a field to provide citation counts - Dan would have to say if that's in the cards, my suspicion would be that enthusiasm isn't all that great.

adamgolding · March 19, 2012

adamsmith:

if the citations plug-in works *at all* for anyone on current version of zotero then it's news to me, it seems to be completely broken.

While I can believe that few people request citation features, it seems to me like one of those features that nobody knows they want--HiscCite and other tools are quite obscure, so users rarely think of such useful possibilities.

Note that, while zotero does indeed not scrape bilbiographic information from the pdfs it downloads, and this is certainly not the sort of thing that comes naturally to Zotero, Zotero *is* the leader when it comes to grabbing lots of information from websites about references, and this information, in the case of GS,WoS,Scopus,and CiteSeerX includes a lot of rich citation information that Zotero ignores. If the internals allowed this to be stored *in principle* then interested parties could add the ability to grab such information to a translator, if I understand correctly.

As for citation counts, how involved can it be to simply add another column to Zotero so that plugin authors can use it? Can plugins add a custom field directly? There are tons of fields one might want to populate...

Anyway, adding the ability to sort by the 'extra' field is one small field that makes the references imported from Publish or Perish far more useful in Zotero, and adding another field to divide this number to produce the citations per year is a second small change--these two alone would massively increase the usefulness of Zotero.

mronkko:

1) see my response to adamsmith above

2) actually I am about to attempt this once I manage to convert scopus references to ISI format, so that I can integrate both sources of citation data in Scopus--there are some problems with one source using abbreviated journal names and the other using full names, hopefully manually 'merging' references won't be too chaotic

3) even requiring manual merging of references would be miles better than nothing--as it stands I have to manually input non-WoS references into HistCite making sure that certain fields match the way they might appear in other references

4) actual export is easy compared to getting the data into zotero, or so it seems to me: if one is exporting a file from zotero to be visualized in HistCite and not merging it with other files, all that needs to happen is that certain fields match (the histcite help file and knowledge base mostly explains what properties a file exported by zotero would have to have)

5) PoP allows grabbing of far more references at once than zotero does (up to 1000), grabs the citation numbers at the same time, and also produces futher data: the GS search result rank, and #citations/year

6) To clarify: zotero lets me download up to 100 PDFs in one click using google scholar. PoP lets me download, and sort by, three diferent citation statistics in one click for up to 1000 references in one click. But then I have two patways into zotero:

1. use zotero to import GS results directly from google scholare--I get =< 100 PDFs but no reference info

2. export RIS from PoP, and import into Zotero--I get =< 1000 entries with *one* (not three) citation statistic reported in the 'extra' field, but no PDFs (and I can't sort by the one statistic that is imported)

I would like to:

a. search in PoP
b. export from PoP
c. import from PoP and get all three citation statistics and have them shown as sortable field
d. then ask zotero to download some or all of the PDFs for what I've imported

alternatively, zotero could add support for grabbing up to 1000 reults at a time and the citation statistics itself, but this seems like more work--I'm not sure.

adamsmith · March 19, 2012

As for the larger changes - I can only repeat - what you're asking is a _huge_ project, involving fundamental changes to the Zotero data structure, syncing, the GUI. Even not counting the work on import translators we're talking hundreds, if not thousands of developer hours. I just don't see how this is going to happen without a third party that is highly motivated either in terms of development time or grant money.

Someone is working on a patch to allow sorting by the extra (and other) fields.

Additional data fields are actually a bigger issue than you make them out to be - which why there haven't been any changes to the fields available in Zotero for, I believe, the last 5 years.
I don't have any feelings about a citation stats field, maybe Dan has input. Generally there is always a concern that adding small, not fully thought-out functionality at all edges leads to a poorer quality overall product. I know that's frustrating, when it reflects the one feature that you think would be _absolutely crucial_ for Zotero - but there are hundreds of such features someone feels about that way.

adamgolding · March 19, 2012

I should clarify that, while I have never contributed code to an open source project, I have some limited 'professional' experience as a developer and while everyone hopes developers can implement the features they desire, I am also interested in getting a sense of which of my desires are containable enough that I might attempt them myself as a patch or as a plugin for zotero.

I'm sure, however, that I am naive as to how much is involved with some of these proposed changes (coding always is more work than you expect! :-)), so I would like to understand what makes some of the changes that seem trivial difficult:

Adding fields seems trivial to me--what is the underlying database system? I know many situations in which a database could easily have a new field added to all of its records, the most trivial being the addition of a new column in a spreadsheet--I know it's not as simple as *that* but I must be missing something if adding new database fields is several orders of magnitude more difficult than that :-)

adamsmith · March 19, 2012

the database is in sqlite - I don't think adding a field there is a problem - but changes in the database affect the software at a lot of other levels (which is why no, you can't add fields with a plugin) - syncing, most obviously, the API, the translator infrastructure and probably more that I'm not thinking of.
Generally, though, I think the main obstacle in getting a field added is convincing the core-devs that it's worth it. Zotero is scheduled for an overhaul of the available fields for the 3.5 release, and one more would likely not change much - but there is reluctance to add on too many fields.

The huge amount of work I'm referring to is to allow Zotero to also import information about references for a citation.

adamgolding · March 19, 2012

It seems like 3.5 then, should allow the end-user to add custom fields, letting plugin authors do the same--I've seen this proposed elsewhere on the forum :-) In addition to the items mentioned above there are lots of things I might want to input within the more specialized score of a particular reesarch project...

Also, what about adding 'types' to the relations in the 'related' pane, and the ability to enforce symmetry? i.e. let the end user provide a text field naming the 'type' of a relation such as 'cites' or 'cited by', with a checkbox for 'symmetric', and then a plugin can do so too, right? that would get us very close to what is needed for storing a citation network in zotero.

I agree that importing information about references from websites seems quite difficult--but taking it in from ISI-format files seems much easier--even if the records generated only have whatever information that the file format provides, such as this:

CR HENDERSON J, 2010, QUAL PRIM CARE, V18, P33

The user could then merge records manually, making any form of auto-merging a distant future possibility. (Sort the relevant collection by title and you're half-way there--if zotero eventually gets the ability to sort by more than one field it becomes even easier.) Scopus records can be converted to this ISI file format, and the "Citation Network Analyzer" program under development that I mentioned above will also hopefully export some sort of file that Zotero can import at some point. If the ISI format could be exported as well then visualization of a zotero library in HistCite also works.

I am curious if I am misguided as to the praticality of these ideas :-)

DWL-SDCA · March 19, 2012

I am familiar with HistCite (now a Thompson product) and Publish or Perish for GoogleScholar. I am also familiar with how each accomplishes the citation analyses you are requesting for Zotero. Even the most basic request -- a citation count field -- is a problem or a nightmare. I'm not clear what this field would contain: 1) Do you want the number of references in the item's reference list?; or 2) Are you asking for the number of times the item was cited?

If (1), an accurate number of references _ might_ be obtained from Google Scholar but in my experience those counts aren't very accurate. The reference count data in WoS is much more accurate but is the property of Thompson. Some journal publishers provide the number of references an article contains on their article webpages. However, few include this in the metadata they provide to enable downloads to reference management software. If they do include it, the number is not reported systematically.

If you want (2) there are still more problems. As you point out, the number of cites to an article will differ between GS and WoS. I find that Scopus will usually provide still another number. As you mentioned thisdepends upon the sources scanned but it also depends on at least three additional things: when the scan was last done ( more recent = higher number ); the accuracy of the citing article's reference list ( much less accurate than most people would like to believe ); and the ability of the database software to connect inaccurate or ambiguous citationd to the true source article.

To get what you ( and I ) would like would not only require knowing the number of cites but also a way to include the actual article metadata, the reference lists of all articles involved, and a way to identify which articles link to other articles. Elsevier, Google, and Thompson own patents on the citation linking processes.

Although what you want would be wonderful to have, bibliography management software is not software for bibliometric analysis. The database structure for each would need to be quite different to efficient at what they do.

adamsmith · March 19, 2012

custom fields are tricky and have many downsides - I highly doubt that they are going to happen for 3.5.
Allowing users to modify the data structure creates obviously problems for syncing, groups, etc.
Beyond that, it's also a question of desirability. While custom fields make users feel good, they affect medium-term performance enormously: Citation styles become tricky. Data export won't correspond to standards any more. Data exchange between users - one of Zotero's founding concerns - becomes a nightmare. That's one of the reasons librarians are (and I mean that in the most loving way) obsessed with standards.
These are not petty concerns. I feel rather queasy about letting users put data into Zotero they can't get out properly - but that would almost necessarily be the case for custom fields.

DWL knows more about citation analysis, so I'll yield to her/him(?) on this.

adamgolding · March 19, 2012

I am talking about different fields for different stats: for isntance, one to store the number of times a reference is listed as 'cited' on google scholar, if the reference was grabbed from there. (this is what the old citation plugins did before it broke.) One, for instance, to store the same figure as reported by ISI, etc. There is no issue of the accuracy of the data if all that is reported is the claims that various databases make--the numbers all have foibles, but remain quite useful heuristics. And yes, there is no limit to how many such fields could be relevant, which is why I think that total freedom to use cutom fields would be great here :-)

The legal concern might be misplaced here--maybe HistCite was owned by ISI all along (I'm not sure), but NetworkWorkbench is an independent open-source tool and it can process all the citation information that ISI and Scopus let you download in the form of text files from their services--are they in violation of a patent?

The basic database structure seems to already be in place, since there are 'related' links between items--I'm sure this isn't the perfect way to store this sort of data, but at least the data would be there so it could be exported to visalizers, or so other tools could read it straight from the zotero sqlite database.

As this conversation progresses, I think I am forming the following two ideas:

Citation Network Storage With Custom Relation Types:

1. allow custom relation 'types' (just associate a text field with a 'related' link)
2. third-party tools convert various sources of citation information into the ISI plaintext format
3. when importing from an ISI WoS plaintext file, create additional items corresponding to the cited items, and create corresponding 'cites' and 'is cited by' relations
4. users can merge records manually
5. third party tools read this information from zotero.sqlite to perform visualizations (For instance, the open-source NetworkWorkbench and Sci2 tools could add an option to read the zotero database. Another tool might generate data good for HistCite in the meanwhile.)

Citation Statistic Storage with Custom Fields:

1. allow custom fields
2. when importing PoP's RIS files, or ISI plaintext files, store the citation numbers there in special fields
3. allow sorting by custom fields
4. allow some fields to be simple calculations based on other fields, excel style (for 'citations per year', that sort of thing)
5. plugin authors can update things like the google scholar citations plugin, and other new things like that

More generally, allowing custom fields and relations seems like the best recipe for opening integration possibilities with other open-source software, right? :-) If the database model is being revisited anyway, I *really* hope that these possibilties can be considered--I would be several times more likely to write plugins if custom fields and relations were available :-)

DWL-SDCA · March 19, 2012

@adamgolding (I saw your post above after i posted this message) After all of this talk about Zotero enhancements am i correct that you would be pleased if you could get articles from Zotero to be exported in the brief ISI format you presented above? That is doable. What is missing are the connections between articles. Aside from GS, I know of no non-copyrighted source of citation chains. Obtaining the information you want without getting into trouble could be a problem. A couple of years ago my project put out a request for volunteers to hand enter citation interrelationships of certain articles available on our online database. Within a month I received letters from Elsevier and Thompson asking for several pages of answers to questions.

@adamsmith I'm a man who is interested in bibliometrics but my main job is being responsible for a free specialty online bibliographic database presented in coopereation with the World Health Organization.

adamsmith · March 19, 2012

Just to manage expectation - while I don't have any direct influence on this, I would be extremely surprised if custom fields happened for 3.5.
As is, 3.5 already entails a lot of work just getting the field additions right for citation purposes.
As Dan notes in the other thread and I note here, custom fields pose major challenges and I'm not aware of any progress towards solving them.

adamgolding · March 19, 2012

@DWL-SDCA very interesting about the legal letters! I find this kind of strange because they let you download this information for up to 500 records at a time from WoS, and Scopus allows something like 1000 or 2000 records worth of citation information to be downloaded. The terms of use say something about not downloading 'unreasonable' numbers of articles, which leads me to think that users doing this for personal use and not republishing the information would be fine--why else do they allow you to download the data? Are you saying they claimed rights to the data, or the procedures involved in processing it? Microsoft Academic Search, and the TouchGraph browser perform visualizations of this sort of stuff as well, I wonder how they tread the patent landscape there...

Exporting information in the ISI format would be a start, so that HistCite could import zotero output, and if citation counts generated by importing RIS files from PoP were also exported, exporting data to histcite in this way would be a fast way to set started on manually mapping small citation networks when researching a given topic.

Of course, *importing* citation links from ISI format files would reduce the manual work involved by another order of magnitude.

@adamsmith I appreciate that you need to be realistic when projecting what the team is likely to work on :-) However, I would like to make sure that the concerns you have, while legitmate, do not misguide the team's efforts--the perfect, as they can say, should not be the enemy of the good, and while it's certainly better to have data in an appropriately-specified field than in a 'dumb' field, it is surely often better to have it in a dumb field than to not have the data at all. Supose a warning were diplayed to anyone who used a custom field: "WARNING! Custom fields are no supported in citations--all information relevant to generating bibliographies must be entered in Zotero-specified fields!" Groups themselves could specify custom fields, but unless the group member 'mapped' a local custom field to a certain group custom field, the fields would be ignored by the group--there are lots of fields that should be user specific, such as if someone wanted to jot down the number of times they've read an article, and sort by that field. Speaking of which, are custom fields more dangerous than notes, when it comes down to it?

Also, user custom fields could be assumed to be for personal use, but plugins could define a field 'owned' by the plugin so that users with the same plugin have the same fields. I should also say that I have ignored the possibility of doing any of this with tags, because you can't sort with tags--as that other thread mentions, 'advanced tags' is another way to go, or rather, is roughtly synonymous with 'custom fields' in terms of user functionality.

Oh, and of course, I proposed custom fields as an open-ended solution so that future things are possible--we also could also consider implementing citation fields in precisely the way that WoS does them, which is as standard as it gets for citation data. (The HistCite help file discusses this briefly, and what criteria it uses to match records in this format.)

DWL-SDCA · March 19, 2012

It is one thing to use WoS or Scopus data in an analysis you conduct on your own machines. It is quite another thing to incorporate their added value (they are citation databases not merely bibliographic ones) into a system that can allow "their" added value to be shared or posted to Zotero groups.

You aren't requesting a simple linking of an article or articles. Doing citation analysis would require much more. What would be involved is a multi- layered and dynamic system of many to many relationships. Articles cite other articles every day. Keeping the linkages up to date requires frequent tests of each article for updated citation information. The database table structure for this is quite different. There currently is no system that allows a Zotero record to be updated automatically -- even to change ePub ahead of print data to the volume issue page information when it becomes available. With my own database I have automated this metadata update for PubMed and one journal publisher. Although several other journal publishers send me journal article metadata and update their ePub data with published print data; each publisher does it differently and the publishers change the way they ftp me the updates one or more times a year.

adamgolding · March 19, 2012

Oh, and @DWL-SCDA:

in response to:

"Aside from GS, I know of no non-copyrighted source of citation chains."

CiteSeerX allows downloading their entire citation database. Microsfot Academic Search provides a lot of citation information--I don't know the legal status of it. Note that GS citations are likely to be scrapable by that tool I mentioned above "Citation Network Analyzer" when it's updated, and it could be altered to export in ISI format, in principle.

adamgolding · March 19, 2012

@DWL-SCDA: Oh, I don't propose anything like the automatic update you describe--just whatever data happened to be there at the time the user downloaded an article--the idea is that this is helpful to a single researcher or small group of researcher when enganged in a research project that is broad in scope, but somewhat short in duration compared to the rate of academic publication--firing up something likst HistCite should one day be how even a first year undergraduate sizes up the literature for a research paper, looking very broadly at the literature as it stands at a given point in time. If the user wanted new data much later they could go through the process again, etc, but once you're familiar with a literature it's less important to 'size it up' all over again--you just stay on top of new papers, etc.

adamsmith · March 19, 2012

the perfect, as they can say, should not be the enemy of the good

Since you need to ensure compatibility etc., software is highly path dependent. Bad decisions early in the process can have disastrous consequences. So yes, for software, often nothing at all is a better solution than something half-baked.

adamgolding · March 19, 2012

@DWL-SCDA: I should also mention that HistCite exports .html files that I have planned to put online at some point, I wonder if I will get some letters from ISI by doing this as well! :-)

adamgolding · March 19, 2012

@adamsmith

"Since you need to ensure compatibility etc., software is highly path dependent. Bad decisions early in the process can have disastrous consequences. So yes, for software, often nothing at all is a better solution than something half-baked."

I agree wholeheartedly in principle although I must confess I have a hard time imagining what the problem could be here in this particular situation: if 'custom fields' are differentied from zotero-specified fields by a flag in addition to the string which identifies them, there will be no collisions when new 'official' fields are added, and a simple processing of the sqlite database could copy a custom field into a new 'official' field if an appropriate one became available, prompting the user to correct any malformed entries--otherwise the custom fields are just like a dumb spreadsheet attached to the zotero database, and the spreadsheet is sync'ed, etc. The same would apply to 'dumb' text labels applied to the 'related' links that already are implemented. In fact, I'm tempted to literally create a 'dumb' spreadsheet that has links to items in the sqlite database in one column ;-) ;-) I literally already maintain a spreadsheet with citation counts from Publish or Perish in parallel with citations stored in Zotero--surely this madness is more crazy than there being custom fields? :-9 I've seen at least *one* other user mention that he has do to this too :-b

At any rate, the flexibility of tags is clearly in the spirit that I'm advocating here, so I cannot be that far off the spirit of Zotero's design with these suggestions--I don't know if tags should be enhanced to 'look like' fields to the end-user in that they would allow sorting, or if 'custom fields' is a better idea than 'advanced tags' but moving in either direction seems to solve a lot of problems :-)

fbennett · March 19, 2012

It seems like the distinction between citation metadata and other metadata is key. The things in the Zotero info panel mainly serve to identify the source. Most of the fields there tie directly to citations, and it really is important for that part of the data to be uniform.

In terms of the data model, supplementary details like citation counts and whatnot (and Call Number, for that matter) are in a completely separate category. That kind of ephemeral information could actually go into some form of structured attachment to the item -- RDF would be the obvious choice, since a plugin could then pick up the object to feed its own triple-store inside the client. Handling supplementary information in that way would avoid show-stopping impact on the schema used for synchronization and the sharing of core metadata, and give you the flexibility to do pretty much anything you want in a plugin.

If someone were to work out a plugin infrastructure for tying such supplementary RDF into saved searches and special views (like the Duplicates and Unfiled Items views), it would open up lots of interesting paths for third-party development. It would be costly in time, study and work to set up, but should be good fodder for a grant application, and I'm sure the core team would be receptive to the result -- the Locate menu is the result of a similar effort.

adamgolding · March 22, 2012

I must correct one major piece of misinformation I stated in the thread above--the Zotero Scholar Citations plugin is *not* broken in the most recent Zotero Add-On, it's just not working in *standalone* yet. Please try it, oh ye weary reader of this thread, somewhere in the future of the internet ;-)

adamgolding · March 22, 2012

@fbennet: I agree wholeheartedly with keeping citation and 'other' metadata very separate--and while we're on the subject of ephemera, citation fields could do with a date parameter which states when the citation information was retrieved--plugins which grab information like this directly could set this date, and it optionally could be supplied when importing ISI format files, or left 'unknown'.

Access via saved searches and special views sounds great, but more important to me would be to use the custom fields for sorting (and the custom relations for exporting ISI format citation files and/or reading a citation network directly from zotero.sqlite or from some supplementary data file).

I'm interested in making at least a stab towards making some of this stuff work, although I'm hesitant to dive right into the Zotero codebase as opposed to tinkering with plugins at the moment--it seems the minimal functionality that plugins would need to gain for this sort of thing to work in principle is:

a) allow a plugin to create and populate additional sortable columns in the middle pane

b) allow a plugin to create new zotero items (probably already possible)

c) allow a plugin to handle import and export of files--either the plugin would do the import/export of ISI files, or import/export of ISI files would have to send/receive data from the plugin

I gather a is not possible, perhaps b and c already are?

mronkko · March 22, 2012

I would say that all of these are possible, but (a) can require a bit of a hack.

A plugin can alter the Zotero classes and objects. For example, ZoteroQuickLook plugins needs new functions in the Zotero.Integration.Fields and Zotero.Integration.Document

You can check this out in the source code starting at line 219

https://github.com/mronkko/ZoteroQuickLook/blob/master/chrome/content/zoteroquicklook.js

Instead of adding functions, you could also replace existing functions. If you decide to go the route of modifying the Zotero objects from your plugin, you need to remember that changes in Zotero can break your plugin.

If you really want to give it a shot, you can ask technical questions at the Zotero dev mailing list.

http://groups.google.com/group/zotero-dev

jacekg · July 14, 2012

@mronkko

Re cited references, I think the situation has improved and what is being exported in the CR (Cited References field) from ISI (WOS) mostly contains DOI's . See an example below.

So, my question is: if a list of bibliographic items
can be exported from ISI (or similar, e.g.m Scopus) including CR list with DOIs, could those be used to automatically "linked" related articles? That is could Zotero's "related" be populated during import?

thanks
- Jacek

Example:
Kammerer Y., 2010, P 2010 S EYE TRACK R, P299, DOI 10.1145/1743666.1743736.
LIU C, 2010, P 3 S INF INT CONT, P215, DOI 10.1145/1840784.1840816.
LIU J, 2010, P SIGIR 10.
Li YL, 2009, J AM SOC INF SCI TEC, V60, P275, DOI 10.1002/asi.20977.
Roberts PM, 2009, INFORM RETRIEVAL, V12, P81, DOI 10.1007/s10791-008-9072-x.
Rayner K, 2009, PSYCHOL SCI, V20, P6, DOI 10.1111/j.1467-9280.2008.02243.x.
White R. W., 2009, P WSDM 2009, P132, DOI 10.1145/1498759.1498819.
Bierig R., 2009, P SIGIR 2009 WORKSH, P8.
Oliveira F. T. P., 2009, P CHI2009 C HUM FACT, P2209, DOI 10.1145/1518701.1519038.
Farzan R, 2009, LECT NOTES COMPUT SC, V5535, P66, DOI 10.1007/978-3-642-02247-0\_9.
Juhasz BJ, 2008, J EXP PSYCHOL HUMAN, V34, P1560, DOI 10.1037/a0012319.
Li YL, 2008, INFORM PROCESS MANAG, V44, P1822, DOI 10.1016/j.ipm.2008.07.005.
Lorigo L, 2008, J AM SOC INF SCI TEC, V59, P1041, DOI 10.1002/asi.20794.
Buscher Georg, 2008, P 31 ACM SIGR C SING, P387, DOI 10.1145/1390334.1390401.
Belkin N.J., 2008, SIGIR Forum, V42, DOI 10.1145/1394251.1394261.
BUSCHER G., 2008, P CHI 08, P2991, DOI 10.1145/1358628.1358796.
GWIZDKA J, 2008, P AM SOC INFORM SCI, V45, P1.
DUGGAN GB, 2008, P 26 ANN SIGCHI C HU, P39, DOI 10.1145/1357054.1357062.
LIU YH, 2008, P AM SOC INFORM SCI, V45, P1.
TERAI H, 2008, P IIIX, P152, DOI 10.1145/1414694.1414728.

adamsmith · July 14, 2012

that depends on what you mean by "could".
Could that be done by Zotero in general? Almost certainly yes. Is there anything currently available in Zotero even approximating that? No. What Frank and mronkko are - imho correctly - saying is that this is likely possible to write as a third party tool, but will require a pretty substantial amount of work. Though as long as you're mainly interested in having "related" articles links that becomes less work than creating a whole separate data layer as fbennett lays out.

I'd be very surprised if core devs would spend any time on this, so this would almost certainly have to be a plugin. Edit: Or maybe a patch, though for that you'd want to think about the user interface questions that will assure that this won't confuse users who aren't interested in using it.

jerryglen · August 17, 2012

I have been reading the comments on extra fields for user specified criteria. I think it is clear that that won't happen due to data structure issues.

Sorting on tags is the common work around. However, many of us would like to see at a glance the status of a paper e.g., 'Read-Immediate' 'Read-Medium' 'Read-Low', 'Read', etc.
Would it be possible to allow the tags to be visible and sortable just like Title, Creator, Date, etc. I think that would address one of the primary concerns expressed.

I see this was suggested by AdamGolding on March 19 2012. I agree.

adamsmith · August 17, 2012

color coded tags - think gmail categories - are planned for Zotero 3.5. If I understand you correctly that would let you achieve what you want.

Amaury Van Espen · November 7, 2015

impact factor and journal ranking for research quality from several sources (by field maybe?) would by such usefull.

What's about the plugin you discussed ?

which ranking are you used to lookup ?

Regards

Amaury