Crowd Sourcing -- bibliographic errors

There is so much manual work just to get items into my zotero database. Call me lazy, but i think a lot of people are. WorldCat and Google books don't always agree when i import sources. I think info might be attained to "track" individual sources - sort of like a magnet for a torrent -- maybe based from the doi or isbn numbers (though chapters in edited books would screw things up maybe). Maybe the journal keywords could automatically be tags like how the abstracts are imported.

After we get rid of this redundancy, so that we are all working on the same file or database entry. I should think there would be a way to crowd source the data collection so that it is more bottom-up organization. For instance, i would trust my fellow Zotero users to update a name, publisher, archive (which libraries have it which is for the most part on worldcat) years maybe (and then you manually specify the edition). I would not want some things shared like my personal notes. I would also add a notifier that warns you when you change a shared property. Or maybe it would be a different version maybe of zotero for "editors." I know lots of people are working on the specific journal styles -- which is very cool.

btw, might i suggest that the date fill in is more standardized? Because for the citations things sometimes need to be in a specific order, so like how author is last, first and it is two blanks. The place too should be city, region, country maybe? Basically, to ensure that it is all uniform when people edit it.

I appologise for the redundancy if you are already working on something like this.
  • I have a couple of comments that you probably will not like.

    If an article, book chapter, or report is used as a citation you should have at-hand a print or electronic copy of the complete document. (See also my final paragraph.) With the document at-hand you should verify that the metadata is correct and in the proper Zotero fields. I believe that crowd sourcing is a very bad idea. (Witness the garbage that is in many of the public online Mendeley records.) Why should a stranger be trusted to do this important work? Verifying and editing with attention to detail is a part of the writing process. Zotero makes it quite easy compared with what was necessary only 20 years ago.

    As recently as the mid-1990s much of searching for relevant articles was done by hand. While there were electronic literature databases they were on mainframe computers and were costly. We were charged by the search term and also by the number of records that were returned by a search. Searching had to be done carefully with search strategies and search terms planned in advance. Search terms were selected from a printed volume. You would submit a search to a librarian who would enter it into the system. There may be many "jobs" ahead of yours both at your library (delayed data entry) and at the mainframe site. Rarely could you get your print-out of citations in less than an hour. If you needed to search more than a year or two back, it was necessary to request that the mainframe operators change a tape to the one containing the year's information. Changing a tape added considerably to the time and cost of the search. Usually the first search result was provided without abstracts. This was because the charges were based upon the number of characters in the search result. Promising citations might lead you to request a second run to obtain abstracts. But that was only done if the journal volume was not in your library's collection. Unlike now, very few abstracts were even available. Electronic full text was hardly even dreamed of. With search result printout in hand, it was necessary to find bound copies of the journal on library shelves. Recent volumes may not be available because they might be at the bindery. Binding could take weeks. Each article needed to be examined. Notes were made on index cards or blank sheets of paper. Maybe you had access to a working photocopier at 10 to 25 US cents per page. Although an early version of Reference Manager was available, it was a text-based DOS program and everything needed to be hand entered. It didn't integrate well with existing word processors. A researcher had to carefully copy all relevant metadata onto the note card and then again carefully type each reference in the appropriate style when writing the document. If you haven't done this yourself, you may not believe the tedious effort needed. My description is really an understatement of what is required.

    While I am on my soapbox, I believe that it is essential that every document should have been read and _understood_ before using it as a reference. In my opinion, to do otherwise, is almost fraud. I don't know you or the way that you work so I don't mean to suggest you participate in this behavior. But I know that sometimes my students do this. When I find this sort of cheating the student's grade suffers. I believe that it is an insult to each of your readers if you cite anything other than appropriate relevant material.
  • Besides what DWL-SDCA writes, I'm trying to figure this out from a technical implementation point of view.

    I would assume that if you had a feature to crowd-source the reference metadata, then the metadata that would be picked as correct would be the one that is most common. References in most reference libraries probably contain unedited metadata, because, IMO, most references that people collect do not get cited in papers and do not undergo more careful review. Hence, the most popular metadata would be the metadata generated automatically by Zotero during the import from a website.

    There are several places where you could import the metadata from. For example, for medical/biological research, the following sources are common: Google Scholar (horrible metadata), PubMed (good metadata), publisher's website (in most cases, best metadata), and CrossRef (via DOI lookup, good metadata). I would say that most people probably skim through papers in PubMed and import from there. Unfortunately, I think that metadata would be sub-par to the metadata offered by the publisher directly, so crowdsourcing the metadata would be a downgrade to someone who imported his reference from the publisher's website.

    The other alternative would be to somehow decide which metadata is correct, but this needs to be done in a way where a small group of individuals is not given too much power. I'm not sure how that would be implemented.
  • edited November 27, 2012
    Thank you DWL-SDCA and aurimas for your comments. I do like putting the process into historical perspective and your comments of students not reading the sources is very relevant, however part of me always thinks that cheaters will always find a way. I use the software similarly to what aurimas wrote "most references that people collect do not get cited in papers and do not undergo more careful review." And I use it many times to generate reading lists.

    Thanks for the tips on different metadata collectors. I'll have to start using Crossref for the metadata. I have very little understanding to know how such a thing would be implemented either. I don't think it is a solution, but I did start a shared library if anyone is interested: http://www.zotero.org/groups/crowd_the_library
  • Beware of CrossRef as an authority for author names. Many author names attached to CrossRef records have the author's first name as the last name and instead of a first-name initial they have the first letter of the last name.
Sign In or Register to comment.