"Clean" and fill references before creating bibliography?

Cubytus · January 17, 2012

Hello all,

considering references come from all sort of subscribtion-based providers, depending on you librarian's taste, most often fields are not filled in the same way. For example, you could get an all caps title, only one cap at the beginning, or a cap on each word. Some publishers abbreviate journal's names, and some other don't. Some abbreviate ppage numbers (e.g. 128-43), yet others don't (128-142). Dates may be provided in digits, letters, both, in many different orders.

Considering all this mess, is there a way to 1- complete the missing fields, if they exist? And 2- Clean up the resulting list of references to get a uniform presentation once inserted?

Thanks,

adamsmith · January 17, 2012

No there isn't.
The ideal format is to have
- Full first names of authors
- Titles in sentencecase
- Page numbers in full
- Journals both with their full title (under journal title) and abbreviation (under journal abbr., ideally including periods)
- date in any format that Zotero recognizes (which includes most common formats) - it will display y m d next to the date.

With this in place, Zotero is able to create pretty much any citation output correctly.

There is no automated way of fixing or completing the data. While some improvements are likely possible (e.g. completing data using DOIs, allowing batch editing), fixing this up is and will remain to some degree a manual task. The best you can do to minimize the amount of manual labor necessary is to import data from databases with high-quality metadata (Library of Congress, JSTOR, many journal publishers).

Cubytus · January 17, 2012

The problem is, I can't reasonably expect to modify 60+ references by hand in a short timeframe, and that articles are already imported from supposedly high-quality databases, yet still many fields are missing (Or university's subscrbtion may not cover the journal I'm interested in.).

Just in case one could come accross multiple databases would it be possible to merge all results?

There's that nifty function in Mendeley where you could click on "Search by title" that allows you to fill in the missing information. Why not in Zotero?

Aren't reference manager softwares designed to help you gain time instead of wasting it?

fbennett · January 17, 2012

Aren't reference manager softwares designed to help you gain time instead of wasting it?

Yes, they are.

adamsmith · January 17, 2012

Or university's subscrbtion may not cover the journal I'm interested in

In most cases (e.g. Taylor and Francis, ScienceDirect, Wiley, JSTOR) that will not prevent you from accessing the abstract view of the article and using the URL-bar icon to get the data into Zotero.

If you find missing fields in imports from a database, do feel free to report it and we'll be happy to take a look and see if import can be improved.

Just in case one could come accross multiple databases would it be possible to merge all results?

in the 3.0 version - yes.

There's that nifty function in Mendeley where you could click on "Search by title" that allows you to fill in the missing information. Why not in Zotero?

Mainly because it hasn't been implemented. But also, you would again get inconsistent data: Mendeley searches other people's databases as well as google scholar for the title. For the former, data quality is inconsistent. For the latter, it's not great: You will only ever get author initials, and commonly miss fields like journal abbr. and DOI.

As a general workflow issue, you will need to check and fix up citations as they come in to prevent having to do it all at the last minute when you're inserting the bibliography. That's no less the case with Mendeley.

Cubytus · January 17, 2012

Most of the point of using a reference manager software is to not have anything to enter or edit anything by hand. Looking for relevant articles is, in itself, a slow task since no database provides 100% relevant results. It's more like 30%, and it's a long process to sort them out.

I see the issue about looking through other people's bases. Still, when I only got the title of a paper article, Google Scholar is invaluable in finding the publication it came from, as well as the providers.

Considering it works just OK in Mendeley, Zotero could use a better "footprint" system, a bit like MP3-taggers, as MusicBrainz does to get a "relevancy" rate about a song; same would go for a scientific article.

An example: DOI: 10.1016/j.neuropsychologia.2006.04.030.
As inserted by DOI, no abstract is inserted, date is incorrect (only year is mandated in science articles), language is not inserted (although most of them are in English, I found a few in Italian, Spanish, Russian or Portuguese), no journal title abbreviation.

Another example: http://ajp.psychiatryonline.org/article.aspx?volume=162&page=1125
No DOI, no abstract, no date, no title, no journal abreviation.

Yet another example: http://www.sciencedirect.com/science/article/B6T0J-3V51F1N-B/2/b0bd0cf027fa150448608c7291a54f42
What journal, author's names and paper title are doing in the "Abstract" section?

I have many other examples of incorrectly filled fields, and while it may be feasible to fill in a few missing fields, virtually ALL references have "holes" in them. If merging records is in the works, when will Zotero for Firefox include it? I'm not talking about the standalone Zotero version since it doesn't take SSL (suggested workaround shown in the dialog box doesn't work).

adamsmith · January 17, 2012

Most of the point of using a reference manager software is to not have anything to enter or edit anything by hand.

If that's your standard, stop using a reference manager. There is no product that comes close to that and I would be surprised if there will be in the next couple of years. It's part of our job as researchers to carefully manage our references and sources. Most people feel some type of reference manager is helpful in that process, but it's up to you.

Considering it works just OK in Mendeley, Zotero could use a better "footprint" system, a bit like MP3-taggers, as MusicBrainz does to get a "relevancy" rate about a song; same would go for a scientific article.

a) Mendeley has very good people working on this with a lot more funding than Zotero, so why do you think Zotero could easily do better? and b) maybe your taste in music is a lot more mainstream than mine, but while the data I get from MusicBrainz is usable for ordering my music, it's nowhere close to standardized in a precise format that would be suitable for referencing.

An example: DOI: 10.1016/j.neuropsychologia.2006.04.030.
As inserted by DOI, no abstract is inserted, date is incorrect (only year is mandated in science articles), language is not inserted (although most of them are in English, I found a few in Italian, Spanish, Russian or Portuguese), no journal title abbreviation.

Inserting by DOI is OK - better than google scholar - but not ideal. You will never get an abstract (though when are abstracts cited?) or a language (though I wonder why you feel you need the language - it's not used in any existing Zotero citation style), nor a journal abbreviation - that data is simply not provided by CrossRef. I have no idea what you're saying about the date - Zotero always imports the most precise date available and that makes a lot of sense for timelines etc. As I'm sure you have seen, Zotero citation styles just use the year in most cases.

Another example: http://ajp.psychiatryonline.org/article.aspx?volume=162&page=1125
No DOI, no abstract, no date, no title, no journal abreviation.

You can see the translator when you hover your mouse over the URL bar icon - this is not a supported database, so it just imports via DOI. That should give you a title, a DOI, and a date, though (it does for me). If that's not the case, something is wrong:
http://www.zotero.org/support/troubleshooting_translator_issues

Yet another example: http://www.sciencedirect.com/science/article/B6T0J-3V51F1N-B/2/b0bd0cf027fa150448608c7291a54f42
What journal, author's names and paper title are doing in the "Abstract" section?

well, check the site - they are in the abstract field. Again, though, I fail to see how that matters for citations. Abstracts are not part of any standard citation style.

If merging records is in the works, when will Zotero for Firefox include it?

again, it's in the current beta version for both Standalone and Firefox.

dstillman · January 17, 2012

I'm not talking about the standalone Zotero version since it doesn't take SSL (suggested workaround shown in the dialog box doesn't work).

Of course it supports SSL. CA-signed certs will work normally. If you're having trouble getting it to accept your self-signed certificate, post to the thread you started, which I responded to (though the instructions in the other thread I linked to are no longer necessary in the latest version of Standalone). We can't help you if you don't respond.

Cubytus · January 18, 2012

Most private servers use self-signed certificates. Without more info or a valid workaround, users may rightfully think "SSL is not working". A "use Firefox's whitelisted certificates" could simply automate the operation described in the topics you linked. Oh well.

Back to articles talk.

Actually my musical taste is very varied, but I appreciate, when a CD was edited, ripped and tagged, when MB suggests a tag that may match its content. It's not always right, but often enough to be useful, for ordering, not a definite as you wrote. I felt Zotero could do better than Mendeley since being open-source usually brings many more developers than closed-source models.

Granted, abstract and languages are not used for citation purpose, but I though it would be useful to ease the indexing process, especially when it comes to pre-1995 papers, which are often only available on paper, need to be OCR-ed, an error-prone process.

Dates can be problematic since I would like them to automatically adjust according to column's width. At least, it would be useful they get an uniform presentation, e.g. no 01/2006 with 2006-02-04. I never had a use for timelines, at least they're not used in science projects, so cannot say a word on them.

About AJP, I just noticed the "Import in Zotero (DOI)" info. I never paid any attention to this small notice until now. I still feel that Zotero could use an automated process to look for other databases listing the article, and fill in missing info. While MB databases are user-edited, academic databases are, hopefully, more tightly managed. Since it can be long or require great CPU power to do so, why not place an option to do it in the background while computer is plugged in?

If records merging is available, where is the command in standalone Zotero?

adamsmith · January 18, 2012

It's not record merging, it's duplicate merging. Righ-click on your library --> Show Duplicates. You can then merge the duplicates it finds.

I agree on the uniform presentation of dates in the middle column.

On the rest, I think you mainly have a naive view on both the legal and technical feasibility, though as I say in the beginning, some functionality to complete records is both desirable and relatively likely to come at some point in the future.
Until then - find out which databases provide good data, use those.

Cubytus · January 25, 2012

Well, here comes a concrete example: I want to use European Journal of Neurology citation style: among others, this style uses abbreviated titles for journals. However, not all citations contain the short version of a given journal title. Some citations end up having the correct style, while others don't.

I do believe that practicality takes precedence over other concerns in this context. It's not clear how "legality" would be implied here, although I quite understand what "technicality" would imply here. That's why scientists do science, and coders code :)

adamsmith · January 25, 2012

legality:
1. Using users' data, even if anonymized, to compete information would likely require specific user consent. With Mendeley you sign industry-standard EULAs when you use the software, essentially signing over everything but your soul. With Zotero you don't.
2. Automatically crawling and scraping proprietary databases is almost certainly illegal and against their respective terms of use.

Technical issues are, of course, very practical.

For Journal Abbreviations, specifically, we'll soon have a more sophisticated solution
http://citationstylist.org/tools/?#abbreviations-gadget-entry

That's why scientists do science, and coders code :)

you may want to read some bios...