Abbreviations for Zotero

fbennett · March 31, 2012

In parallel with work on the citation processor (citeproc-js), I've put together an Abbreviations Plugin that enables external maintenance of abbreviation lists. The plugin seems to work with official Zotero as well as the experimental multilingual client (MLZ) for which it was originally designed. The UI is not ideal, but I've recently eliminated some of its more annoying quirks, and it should be serviceable. The link above leads to an install page, with an onward link to some rough documentation.

The plugin stores abbreviation lists on a per-style basis in an SQLite database. Only the abbreviations relevant to an open document are pulled into memory, so it should be possible to carry large abbreviation lists for a large number of styles without a serious impact on performance.

Please post feedback on the plugin to this discussion thread.

adamsmith · March 31, 2012

I'd be interested in setting up the most typical abbreviations list for regular Zotero users. Do we have any idea about the rights on lists like Index Medicus, Chemical Abstracts etc.?

fbennett · March 31, 2012

It looks like the PubMed people are on our side. Here's the copyright notice on the NCBI site, and here is an FTP download link for their lists (PubMed Journals and NCBI Molecular Biology Database Journals). The lists contain two abbreviation versions for each entry (MedAbbr and IsoAbbr). The ISO abbreviations are probably offered for sale somewhere, but you have some cover from the important work of Public.Resource.org on government publication and incorporation by reference of privately published standards.

fbennett · March 31, 2012

For legal periodicals, Cardiff University offers a useful online index that combines (possibly overlapping) abbreviations for many publications and jurisdictions. They don't provide a mass download link, but there is clearly a common interest, and when things move a little further down the road it might be interesting to explore a mutually satisfactory way of leveraging their content.

wonblee · April 12, 2012

I think this is a marvelous project. Abbreviation list always has had me look back at Endnote occasionally.

I have a couple of questions and a question/suggestion.

1. Could you elaborate on the importing process for those who are less familiar with JSON or CSL files? For testing, I downloaded one of your MLZ styles, but it's a CSL file, not a JSON file that the plugin looks for when importing.

2. I've downloaded the PubMed abbreviation list. How do I convert this into a format that the plugin can parse out? I'm assuming it will require some find & replace operations to the file.

3. I think a repository of abbreviation lists for this plugin on the web where anyone can add to the lists in a wiki-like collaborative manner would be wonderful. Would there be a legal hurdle to such implementation?

Thanks!

adamsmith · April 12, 2012

we're definitely thinking about 3 - the final target is that the abbreviation lists will be specified as part of the style and will be downloaded automatically (but we're still a good bit from that) - and it's something I'm going to spend some time on soon. So far the legal situation is looking quite promising, especially for the medical abbreviation lists - for some reason the medical community has done much better than anyone else in opening up these types of things.

fbennett · April 12, 2012

The abbreviation lists are separate from the styles. Nothing comprehensive is yet available; I've just been building small lists for testing in connection with the style proofsheets, to get the structure right. When the styles stabilise (probably in the next month or so) I'll look at filling out their content a little, but the focus for those will be on basic legal resources.

Here is a simple example of the JSON structure. Building a list of journal abbreviations manually would just be a matter of filling in the blanks, as it were. To convert data from a public list to this format would require some modest scripting, but the abbrevs are just one-to-one mappings, so apart from data cleanup etc it shouldn't be a huge task. Journal abbreviations go in the container-title segment; the others can be left empty:

{
  "default": {
    "container-title": {
      "European Human Rights Reports": "EHRR",
      "All England Reports": "All E.R."
    },
    "collection-title": {},
    "institution-entire": {},
    "institution-part": {},
    "nickname": {},
    "number": {},
    "title": {},
    "place": {},
    "hereinafter": {},
    "classic": {},
    "container-phrase": {},
    "title-phrase": {}
  }
}

cumuluss · April 18, 2012

As I said it in another forum, this plugin is a really good approach. Before, I solved this problem with a makro in word which replaced the journal names with their abbreviations. But this was not elegant. So it is much much better.

One point I would mention here is that for some journals I have different full names in my library (I think you are aware of this possibility). Cleaning up my library would be probably the best solution but time consuming. For the abbreviations I solved this by adding all my different name versions in the json file.

What I would suggest or better request is something like a merge function for different journal names to only one. Or similar to the tag renaming – if you rename it once in the tag selector it will change all of them in different references (and additional to this I think this is also relevant for the creator name fields). I’m sure somebody else requested that before, but I think it fits good to this approach.

adamsmith · April 18, 2012

batch editing is one of the top priorities for the 3.5 version of Zotero - if I understand you correctly, that would solve this, or at least help to solve it.

fbennett · April 19, 2012

Yes, I had thought that the plugin might be useful as a workaround where the same journal is named in slightly different forms in different items. Batch editing is a separate thing, but I believe that both features (abbreviation support, batch editing) are in the sights of the core team for a future version of Zotero.

cumuluss · April 19, 2012

Thank you for your answers. Yes, you understand me right. And I agree with that that both things should be seen as separate things. So I’m looking forward for the batch processing, which will indeed make the work with abbreviation lists much easier.

qztseng · May 2, 2012

Hi fbennett,

I wrote a script and made a json abbreviation file from the pubmed journal database. However, the import function seems not working correctly. Whenever I import this json file containing huge entries of abbreviations, nothing changed. If I use export, I can only get the entries before import (which means nothing imported). Only if I type in the abbreviation manually, the entries get modified in the exported file.
Could you have a look at my json file ? Maybe I missed something.....
You can download it here
https://docs.google.com/open?id=0B7d1ivQI3OkpS3AwdE9Ia3lfeEk

cheers,

fbennett · May 2, 2012

The file isn't valid JSON. The plugin should complain rather than failing silently, but that's why the import is failing. (On a quick check the only problem I spotted were unescaped double-quotes inside quoted string content, but there may be other issues.)

qztseng · May 4, 2012

I correct the file and made a new one:
https://docs.google.com/open?id=0B4OI8S-ZuErIR2lRYThVOEN5X1k

However, it seems to me that with ~25000 entries, zotero and firefox just stopped responding.

Another inconvenience is that the plugin only maps the journal title with matching case (i.e. "The Journal of cell biology" will not match with "The Journal of Cell Biology"). As the upper- and lower-case changed with citation style, it becomes quite problematic. Is it possible to make the plugin ignore the capitalization ?

fbennett · May 4, 2012

it seems to me that with ~25000 entries, zotero and firefox just stopped responding.

There is no progress indicator (yet), but it's working, and you just need to wait until the import clears. I didn't do a full import, but in 15 minutes on a slow machine it processed about half of the file. It is recommended that we perform database operations asynchronously (which I haven't done, for simplicity), to avoid locking up the UI. Async coding is higher math for me, as it were; I'll keep things as they are in the plugin for the present, but this will likely improve at some point.

The speed of import itself can be improved. It's running in a single transaction (which is faster than not), but we can speed things up a bit further by using precompiled storage operations. I'll look into setting that up as time permits.

Is it possible to make the plugin ignore the capitalization?

This could be done, but it would mean data loss, and so increase ambiguity. I'm not sure it would be a good idea.

The PubMed list itself is pretty messy, with some journals registered in abbreviated form. Forcing everything to lowercase would increase the possibility that the abbreviation of one journal (registered as its "full" name) overlaps with the proper name of another. If matches are case sensitive there will be more misses, but the user can still register a missed journal name form when it is encountered, which seems adequate.

fbennett · May 6, 2012

@qztseng: I've made a few speed improvements to the plugin. There is still no progress indicator, but on my system here it loads the 26,000+ entries in the J_Entrez.json you have produced in about 30 seconds.

Give it a try when you have a chance. I think the changes will work across all platforms, but if you have difficulties, let me know and I'll sort things out.

qztseng · May 7, 2012

I updated "Abbreviations for Zotero" to the version 1.0.120.
The speed is improved, however the browser kept popping up the warning:
"Warning: Unresponsive script.........Script:chrome://abbreviations-for-zotero/content/xpcom/import.js:80", even when the importing was finished. For example, if I chose to import the whole 26,000 entries to replace the entire local list, the browser was busy for ~10sec and the warning popped up, if I click cancel at this first warning, and then export the list, only about 8000 entries were imported.
As a result, I have to click continue upon this unresponsive script warning for 3 times and click cancel at the 4th time when the warning popped up to have the entire list imported. If I didn't click cancel at all, this unresponsive script just kept popping up forever (at least >10 times).
It looks like the script just didn't tell the browser it has finished the importing task.

On the other hand, it would be really helpful if you can add an option for the plugin to ignore the case, because almost 90% of my journals are not matched because of this capitalization issue.

Thanks a lot for your help,

fbennett · May 7, 2012

We can pick up the capitalisation issue later. Let's sort through the import behaviour first.

You are probably not seeing the speed improvement yet. The main speed boost came from adding a few database indexes that were missing. These don't (yet) get installed in an existing database if they are missing. I'll set that up, and tweak another thing that is probably slowing things down (the default import method will still be very slow, I think, due again to a missing index or two).

More news in a few days.

uku · June 4, 2012

thank you, fbennett for the effort of managing abbreviation lists! I've been playing around with it for a while and now the abbrev. window just shows up really small. Could you please add window resize possibilility as reinstall apparently does not fix this problem :(

Screenshot:

https://dl.dropbox.com/u/5277753/AddEdit%20Citation_2012-06-04_15-56-02.png

Rintze · June 4, 2012

@uku, can you right-click the title bar ("Manage Ab...") and select either "Size" (and change the window size with the arrow keys) or "Maximize"?

uku · June 4, 2012

@Rintze, thanks, did not know that!

Some feedback about the script - we need a an option to create versions with and without full stops, depending on citation format. Currently the Index Medicus list compiled by qztseng has some abbrevation with full stops and other without.

The other issue is slightly different versions of journal names in index medicus and Zotero. But this has already been mentioned, I won't elaborate on that.

I think the best solution rather than a list would be kind of a dictionary - there are quite a few words that tend to repeat that could be easily replaced by a script.

Rintze · June 4, 2012

we need a an option to create versions with and without full stops, depending on citation format.

You can change the CSL style to strip the periods from the journal abbreviations. See http://citationstyles.org/downloads/specification.html#strip-periods

I think the best solution rather than a list would be kind of a dictionary - there are quite a few words that tend to repeat that could be easily replaced by a script.

See http://forums.zotero.org/discussion/8874/getting-journal-abbreviations-from-a-repository/?Focus=41749#Comment_41749

uku · June 4, 2012

thanks. it is just that the current Index Medicus list has full stops on and off. Someone with patience needs to fix that then :)

yourwelcome · July 5, 2012

Hi, I'm curious if the issue that qztseng has been resolved. I've made my own .json file, and downloaded his massive one (https://docs.google.com/open?id=0B4OI8S-ZuErIR2lRYThVOEN5X1k) , and in both cases the import didn't work. For his 1.7MB file, it caused the "unresponsive script" notifications that he reported earlier, with no change in abbreviations, nor change to the local .json file after exporting. When I import my list of ~2000, there is likewise no change to abbreviations. I'll concede that there may be weird run-on quotataions marks or other formatting issues with my json file, but has anyone successfully imported the qztseng PUBMed list?

Thanks,
Rob

fbennett · July 5, 2012

The first thing to do would be to run the file through a syntax checker. It has to be valid json. Then try taking it in smaller pieces first, to be sure the plugin is functioning correctly on your system. After you have confirmed that your json file is valid and that valid json imports correctly, it's worth trying the full-sized file.

An unresponsive script warning isn't an error; it just means that the import hasn't finished yet. If simple small imports are confirmed to work and you have a valid json file, you would want to click "Continue".

yourwelcome · July 5, 2012

Thanks fbennett for the advice.
No luck with the PubMED list. But I cleaned up the JSON in my own list with the help of an online validator (http://jsonlint.com/), as well as saving as ANSI. The largest hurdle was different character's for " (right, left, simple).

Works great. Is there a abbreviation's repository I can upload it to? The subjects are biology, ecology and environmental science.

Thanks for supporting Zotero

fbennett · July 5, 2012

Yo, that's great news. Thanks for the effort you've put into this.

There is no community repository yet, but we can open one. Shall I open a space for lists on GitHub, and reflect it on CitationStylist as a solution for the time being?

adamsmith · July 5, 2012

There is no community repository yet, but we can open one. Shall I open a space for lists on GitHub, and reflect it on CitationStylist as a solution for the time being?

yes, that'd be great.

yourwelcome · July 5, 2012

Yes, I thinks that's a good idea. I'll upload my list.
Now I'm looking for a good list for statistics and mathematics...

DWL-SDCA · July 11, 2012

I have created two files (one in JSON format the other in csv format) of journal abbreviations. These 12,719 journals are from the SafetyLit database. About 1/3 of these journal titles are not in the NLM/PubMed list.

Alas, I couldn't quite get the JSON format that fbennett suggested. My JSON file does validate. However, when it is imported into the plugin, only the journal title seems to be there. I hope that someone can convert the list so that it is useful.

The two files may be found at:

http://www.safetylit.org/old-stuff/journalslist-120711.csv
http://www.safetylit.org/old-stuff/journalslist-120711.json

I tried to learn to do the necessary scripting to get this to work but I'm frustrated at my lack of skills and my lack of time needed to spend on learning how to do what needs to be done.

fbennett · July 11, 2012

The json link returns "file not found". If you can put it online, I'll be happy to take a look.