URL and DOI fields exporting into to Bibliography (?)

When I do a quick copy export of my selected library into MSWord, in order to produce a bibliography formatted in Chicago style, the URL and the DOI fields of various items are coming along for the ride, where they show up at the end of each item, like this:

Kitcher, Philip. 1990. “The Division of Cognitive Labor.” The Journal of Philosophy 87 (1): 5-22. doi:10.2307/2026796.

Storey, Ian C. 2003. Eupolis, Poet of Old Comedy. Oxford: Oxford University Press. http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199259922.001.0001/acprof-9780199259922.

Is there any way to strip out this data in one fell swoop, instead of going through each record and deleting it? Conversely, can I change the default settings of the Chicago export style so that neither the DOI nor URL field is included in the export?

I see that some version of this problem has been dealt with in other discussions, but I poked around for a while and was not able to discover any sort of conclusive fix.

Thanks in advance,
  • This is set by the respective style.

    I'm not actually sure if CMoS author-date should have a doi (the note-based styles don't). If not, we could fix that globally.
    The URL probably shouldn't be in the Zotero entry for the Storey book (URLs should be limited to source URLs for electronic resources - and those should be cited in CMoS - for other cases, attached links or snapshots will do).

    Do you remember where that citation came from?
  • I think it came from the Wellesley College Library catalog, but I'm not sure.

    I have other data in the URL fields for other books which I had ported over from EndNote, so this is by no means limited to the Storey, nor to records retrieved from Wellesley.

    Is there a way for me to make a customized style based on CMoS that would allow me to get rid of the DOI and URL fields (I know I could do this in EndNote, but I'm not as familiar with Zotero)?

  • edited February 15, 2012
    In general yes, but it's a bit more involved than with Endnote:
    Especially with Chicago styles this isn't going to be easy, as the style is rather complex.

    edit: But you should really clean up those entries - hacking styles is no solution for bad data - if you are ever going to cite online sources that require a URL your style won't produce correct CMoS.
  • Thanks for pointing me to the editing option.

    I think scripting a CMoS variant will be *much* more efficient than cleaning up these fields, at least for the sort of sources I deal with.

    I understand the need to clean up entries (easier said than done when I have ~2000 of them), but it does seem like a clunky way to deal with this issue, which looks like a common one for users. JSTOR is where I get 90% of my periodical citations, and is thus the source for the DOI field data. Many publishers' websites and libraries put data into the URL field. I'd hardly call all this "bad data."

    The ratio of print to online sources I use is probably along the lines of 500 to 1. I am sure most people working in the humanities would have a similar ratio. Most bibliographies *do not* need to have either DOI or URL data in them, so why should these fields be included within the default mode of CMoS when formatting data in this way? It makes no sense to me, and given the number of threads devoted to this problem, appears to make no sense to others as well.

    Sorry to vent, but this has been driving me nuts.

    Thanks for your help!
  • edited February 15, 2012
    Let me clarify the details.
    DOIs are not bad data, they should never be removed. I don't know if DOIs should be in the author-date version of CMoS - if they shouldn't and you can point me to the relevant section in the manual I'll remove that from the style. As I say above, if you just need any CMoS version, they're not in the full note version.

    URLs are OK for entries from full-text databases such as JSTOR - you will note that articles from JSTOR will typically not have a URL included in the citation even though they have a URL in Zotero.

    Books from library catalog and other non full-text sources should _never_ have a URL in the URL field - and yes that is bad data. The URL is not part of the data for such an item - it's just a random relict of where you got the data from.

    Other full-text sources retrieved online - e.g. a full-text for a book online or a report placed online - do require URLs in most citation styles, including CMoS.

    Zotero has been created by historians, you can assume that people have thought hard about any issue that is relevant to the humanities.
  • I think my frustration boils down to this: folks are going to retrieve data from a wide range of sources, some clean, some dirty. What makes Zotero, and other systems like this, so useful is the ability to refine that data, organize it in various ways, and present it in a nicely-formatted manner. I agree that things like the DOI or URL should never be removed from the raw data, but why shouldn't the end-user have more control over *how* to filter and format that raw data for the purpose of, in this example, providing a print bibliography for an article submission? I don't have my copy of CMoS next to me so I can't find the relevant section right now, but I have *never* seen DOI information in *any* bibliography in the journals in my field, or in ancillary fields. Author-Date is a standard CMoS format, so why should Zotero keep the DOI field as an exported field for a print bibliography in that format? Keep it in the raw data, sure, but let us strip that field (or any other field) out when we are producing bibliographies that do not call for it. Make sense?
  • edited February 15, 2012
    CMoS 14.4:
    When citing electronic sources consulted online, Chicago recommends—as the final element in a citation that includes all the components described throughout this chapter and in chapter 15—the addition of a URL or DOI.
    CMoS 14.167:
    When citing the online version of a book, include the URL—or, if available, DOI—as part of the citation (see 14.5, 14.6).
  • @Simon

    Ah, but in neither case are these sources *consulted online.*

    JSTOR is an online repository of (for the most part) print materials.

    Records for books secured online are done simply in order to save time. I am not citing the *online version* of a book. I have the print book in front of me, but it is much faster to grab that data online rather than manually enter it all. Surely this is not an uncommon method for securing data for researchers?

    So, I don't think either of the sections you cite are dispositive to the cases at hand.
  • What Simon says wrt to CMoS. We follow style manuals.

    Otherwise - styles are customizable, it just takes a bit more work.

    As for cleaning up references - I was referring to URLs, which should be removed from items where they don't belong. That's the only way to prevent erroneous citations.
  • OK, so is there a way to globally edit a selection of Zotero records such that I can delete every piece of data in the DOI and URL fields? This is a blunt way of fixing this problem, but at least I know it will be fixed for the long term.
  • Batch editing is forthcoming, but no, not yet.
  • OK! At least I know what I have to do in the near term. Thanks for your help and patience.

  • edited February 15, 2012
    There's still no reason to remove DOIs or full-text URLs. Regardless of which version you're citing (to the extent that that matters), you're just making your bibliography less useful by removing them. (You may not be used to seeing them, but Chicago obviously started recommending them for a reason.) And by removing them from the Zotero items themselves you're potentially compromising your ability to use the data in your Zotero library in useful or interesting ways in the future. Your call, of course.

    Removing useless (e.g., non-full-text) database URLs—the part, as adamsmith explains, that's bad data—is fine, and if you find instances of those you should give us examples so that we can stop those from being saved. Generally speaking it's Zotero that determines what goes in what field, not the site.
  • @Dan

    I guess I'm not being clear in my attempts to make a distinction between online sources and print sources that happen to also be online.

    With respect to the former we have no disagreement. For example, should I wish to cite something from the Perseus website, this would apply.

    With respect to the latter, however, I do not think the CMoS sections cited by you, adamsmith, and Simon are appropriate. For example, Nicholas D. Smith's 1983 article "Aristotle's Theory of Natural Slavery" has now been assigned a DOI: 10.2307/1087451

    It can be accessed digitally a few different ways, but the primary resource for most folks is going to be JSTOR, assuming that their library doesn't have a print run of the journal Phoenix (and most libraries are now pulping journal runs that are available via JSTOR).

    So, when I go to JSTOR to download the article, and save the bibliographic data in Zotero, it grabs that DOI (along with many other things, all of which makes sense).

    But, when publishers in my field (and, I would guess, most humanities fields) call for a CMoS 14 style bibliography (and some do), they *do not* want to see the DOI hanging at the end of each citation. The proper citation format is simply Author-Date. Although I have accessed the article online, and I have a digital copy, it is not considered to be an online resource. Period.

    So, I can go through my bibliography in MS Word and manually remove the DOI for every article I accessed via JSTOR. All things told this isn't the end of the world, but as I said before, this seems like a clunky way to deal with this issue. A more elegant work-around would be to allow me to manually disable whichever field I might choose for any given bibliographic export. This would protect raw data, and prevent me from making my bibliography less useful in the future, as you rightly note. But it would give me enough flexibility to obviate customizing the current CMoS script.

    Is it that such a feature is too complicated to add? Again, this is no big deal in the long run, but based on the forum discussions I am not the only person to be frustrated by this very issue.
  • edited February 16, 2012
    But, when publishers in my field (and, I would guess, most humanities fields) call for a CMoS 14 style bibliography (and some do), they *do not* want to see the DOI hanging at the end of each citation. The proper citation format is simply Author-Date. Although I have accessed the article online, and I have a digital copy, it is not considered to be an online resource. Period.
    CMoS chapter 14 contains several examples with JSTOR URLs and DOIs, so I think it really does consider articles accessed via JSTOR to be an online resource. However, section 14.4 also notes:
    Publishers, however, will have their own requirements, which may depend on the type of work and the uses to which it will be put.
    If there are journals that don't want the DOI, we should make styles for those journals that don't include the DOI.
  • If you can point to a publisher that requires CMoS w/o DOIs, we're happy to put that style up under that name.

    CMoS itself is pretty clear that they do want the doi for print articles accessed via JSTOR:
    an example from 15.46
    Morasse, Sébastien, Helga Guderley, and Julian J. Dodson. 2008.“Paternal Reproductive Strategy Influences Metabolic Capacities and Muscle Development of Atlantic Salmon (Salmo salar L.) Embryos.” Physiological and Biochemical Zoology 81 (4): 402–13. doi:10.1086/589012.

    and more examples from 14.180:
    2. David Meban, “Temple Building, Primus Language, and the Proem to Virgil’s Third Georgic,” Classical Philology 103, no. 2 (2008): 153, doi:10.1086/591611.
    18. Jeanette Kennett, “True and Proper Selves: Velleman on Love,” Ethics 118 (January 2008): 215, doi:10.1086/523747.
    23. Boyan Jovanovic and Peter L. Rousseau, “Specific Capital and Technological Variety,” Journal of Human Capital 2 (Summer 2008): 135, doi:10.1086/590066.

    Jovanovic, Boyan, and Peter L. Rousseau. “Specific Capital and Technological Variety.” Journal of Human Capital 2 (Summer 2008): 129–52. doi:10.1086/590066.
    Kennett, Jeanette. “True and Proper Selves: Velleman on Love.” Ethics 118 (January 2008): 213–27. doi:10.1086/523747.
    Meban, David. “Temple Building, Primus Language, and the Proem to Virgil’s Third Georgic.” Classical Philology 103, no. 2 (2008): 150–74. doi:10.1086/591611.

    These all exist in print, but CMoS suggests they be cited with DOI.

    I don't think there is a great willingness to add features for what are - essentially - hacks. People should just use styles, they shouldn't concern themselves with the specifics of the style requirements.
  • If there's a publisher that doesn't want DOIs in the bibliography, then we can and should create a new style that does just that. If the publisher doesn't want DOIs, then they're not using the CMoS as written. That's fine, and that means we need a new style. It's somewhat unfortunate in light of the utility of DOIs, but you are right that many humanities publishers seem to take this as an unwritten exception to the CMoS. This isn't the first irrational deviance from standard, nor will it be the last.

    adamsmith: Do you think it would be reasonable to create a no-DOI version of CMoS for this unfortunately large number of publishers?
  • I'll create the style as soon as I have at least one publisher that requires it.
  • Quite reasonably.

    rsobak: Can you provide an author guide or other such materials from a publisher with this as a documented, consistent deviation from the CMoS? We like to tie these variants down to specific institutions that require them, to provide a bit of order in the chaotic world of citation styles.
  • I'm not by nature a betting person, but I would happily lay down a quarter that says the CMoS 14/15 guidelines you cite are deviant with respect to publisher's requirements in the humanities. CMoS may *suggest* that articles be cited with DOI, but most journals aren't following along. This may change in the future, but as of now the "irrational deviants" are running the show. I would be happy to include DOI information in my CMoS format bibliographies, but as of now I am being asked to remove it, even though, as you note, CMoS suggests it be included.

    @adamsmith: I'll give you three: Classical Antiquity, American Journal of Archaeology, and American Journal of Philology.

    To take AJP in particular. They specify CMoS 14, but if you inspect every single article that they have published in the last 5 years you will not find a DOI reference in either notes or bibliography.

    Is this sufficient?
  • edited February 16, 2012
    you're confusing what the CMoS numbers mean.
    Simon and I are citing chapter 14 and 15 of the current (16th) edition.

    AJP requires CMoS 14th edition. (That will actually be hard to do - the 15th edition is at least online. I'm not really motivated to go to the library to get a 20 year old style manual - the 14th edition was published in 1993).

    AJA doesn't use CMoS, but its own style roughly based on CMoS. (they refer to CMoS, but only in addition to their own guide).

    Classical Antiquity says they're using the AJA style, but what they actually use isn't identical (e.g. they don't cite publisher names).

    We can look at these styles, but they are different from CMoS 16th edition in multiple ways, so it's much more than just removing DOIs.

    They're also all using a author-date in notes version of CMoS that we simply don't currently have for Zotero, though it's certainly possible.
    There is a dated note version that looks similar in-text, but different for the bib.
  • For all of these styles, I won't do that by myself, so this applies:
  • Ah, the joys of being a classicist, ever trapped in the distant, out-of-print past.

    OK, let's start over, and pretend the previous 22 entries never happened. ;-)

    Dear adamsmith,

    Would you be so kind as to make a new Output Format, named American Journal of Philology, which follows CMoS 16 Author-Date, but differs from it in two minor, but key, respects: neither the DOI nor the URL field is to be included in the bibliographic export?

    I (and thousands of classicists everywhere) would be forever in your debt.

    Multas Gratias,
  • Fair enough. I'll invest the time to work up the style myself and post it.

    Out of curiosity, however, why *not* give users the ability to customize bibliographic exports in an ad-hoc manner? I can imagine a dialogue box that has a list of the fields to be exported, with a yes/no button next to each. I click on the "no" next to the DOI field, problem solved. Is this technically challenging in ways that I don't have the expertise to appreciate?
  • Follow my link - you don't have to code the style - unless you're pretty good with XML that would likely be quite hard. I just need a list of differences - please take the time to carefully look at the style guide and the style - I can pretty much guarantee you that URLs and DOIs aren't the only difference (starting with the fact that AJP wants citations in footnotes and not parentheses).

    The field custom export is just something that's a substantial amount of effort to code and design the GUI for with a very limited benefit. The idea of a ref manager is for a user not to have to fiddle with these things.
  • edited February 16, 2012
    Actually, CMoS 14.4 has to be taken in the context of CMoS 14.10:

    "Publications available in more than one medium: In many cases the contents of the print and electronic forms of the same publication are intended to be identical. Moreover, publishers are encouraged to note explicitly any differences between the two (see 1.73). In practice, because there is always the potential for differences, intentional or otherwise, authors should cite the version consulted. Chicago recommends including a URL or DOI to indicate that a work was consulted online; for other nonprint items, the medium should be indicated (e.g., CD-ROM) ... (Though a print source may list a DOI, authors need not record it as part of their research unless their publisher or discipline requires it.)"

    In other words, CMoS is saying that the reference should strictly represent the source cited: "When citing electronic sources consulted online" in 14.4 is meant restrictively: that is, DOI should not be included when consulting hard copy.

    This is reinforced in 14.18, Journal Article [the first two examples do not include DOIs, then]:

    "The DOI in the following example indicates that the article was consulted online;"

    So by having the DOI always indiscriminately included, a distinction that CMoS explicitly intends to underline is being erased.

    The same point is emphasized for books. Note that CMoS does not include URLs or DOIs in any of the examples of book bibliography from 14.74 up to 14.166, where Electronic Books begins – and where they say:

    "The majority of electronically published books offered for download from a library or bookseller will have a printed counterpart. Because of the potential for differences, however, authors must indicate that they have consulted a format other than print. This indication should be the last part of a full citation that follows the recommendations for citing printed books as detailed throughout this section."

    So again, what is quoted by Simon from 14.167 is also meant to distinguish consulting an online version of a book from consulting the hard copy.

    I do not at all wish to suggest eliminating the URL or DOI data from the captured records, but, if the goal is to adhere to CMoS, then clearly these should not be included in the citation or bibliography as a matter or course. Strictly speaking, it varies on a case by case basis whether to include it.

    If anything, print and electronic versions are likely to begin more and more to differ from print versions in time, and so it will become increasingly important to distinguish them. Perhaps the only way to do this will be with separate records – but that gets messy. So the ideal would to be able to easily indicate whether or not to include the field case by case – but that, I imagine is too technically challenging.
  • yes, I'm aware of that distinction in CMoS - but consider actual usage. How many academics do you know who still go to the library and actually consult the print volume of a periodical when it's available online on JSTOR, EBSCO etc.? As Rob says above, that's something like 500:1.

    And _if_ you're one of those academics, then yes, it would make sense to input the data for those articles without DOI.
    Same for books/ebooks. If you actually download an ebook version, there should be a URL in the URL field. But if you get a physical copy from the library (which isn't yet as rare as physical journal articles) there shouldn't be a URL - hence my statement that books from library catalogs shouldn't have URLs.
  • I think Rob is saying the opposite:

    "The ratio of print to online sources I use is probably along the lines of 500 to 1. I am sure most people working in the humanities would have a similar ratio."

    i.e. 500 print to 1 online.

    Whether his second sentence is correct is another matter. I think many people still actually _subscribe_ to the one or two main journals in their field!

    There is also another distinction being made in 14.166-167: that between an ebook edition (166) and electronic access to the pdf of a print edition (167). The latter is more common for academic books, but this could change.

    As the CMoS examples suggest, the former is common in academia for editions of literary classics. This involves stating the ebook edition rather than providing a url and CMoS 14.166 wants this information at the very end:


    This indication should be the last part of a full citation that follows the recommendations for citing printed books as detailed throughout this section.

    Austen, Jane. Pride and Prejudice. New York: Penguin Classics, 2007. Kindle edition.
    Austen, Jane. Pride and Prejudice. New York: Penguin Classics, 2008. PDF e-book.
    Austen, Jane. Pride and Prejudice. New York: Penguin Classics, 2008. Microsoft Reader e-book.
    Austen, Jane. Pride and Prejudice. New York: Penguin Classics, 2008. Palm e-book.

    The printed counterpart to the Penguin Classics e-book offerings would be cited as follows (note the different publication date):

    Austen, Jane. Pride and Prejudice. New York: Penguin Classics, 2003.


    However, entering "Kindle edition" into the "Edition" field in Zotero and then outputting as Chicago full note with Bibliography places the information in the middle:

    Young, Brian J. Respectable Burial Montreal’s Mount Royal Cemetery. Kindle ed. Montreal, Que: McGill-Queen’s University Press, 2003.
  • right, I misread Rob. I stand by my general point, though.

    We're aware of the problem with e-book edition - there are a bunch of threads on that, let's not go there here.
  • I very much appreciate the time and patience you have shown in dealing with my questions and comments here.

    Zotero should capture as much data as possible, period. If I've implied otherwise then I apologize.

    Zotero should provide output styles according to the guidelines set down by the style guides themselves (e.g. CMoS). You can't chase down every crazy variant for any given publication. That is not your job.

    Most scholars, when working with secondary sources, now tend to get citation data for those sources online (library catalogues, JSTOR, EBSCO, etc.). This is true even for sources that are, strictly speaking, print sources. JSTOR may digitize back issues of a print journal, and for the sake of expediency I may grab a PDF of an article from that journal, but that does not make that journal an online source. More importantly, were I to include the DOI of that digitized article in my bibliography, even for publishers who use CMoS, I would be asked to remove it.

    So, we have problem here between how CMoS *suggests* things be done (de iure) and how things are *actually* done (de facto). The evidence for this difference can be found in the actual bibliographies published in any number of humanities journals out there (I consult journals in Classics, Archaeology, Political Theory, History, and Philosophy, so I think my anecdotal sampling is statistically significant). DOIs are not used. Period. They may/will be in the future, so let's make sure to capture them and keep them in the raw data, but they aren't yet needed in the output, in many cases.

    Given this, I would echo what Aurele wrote: "I do not at all wish to suggest eliminating the URL or DOI data from the captured records, but, if the goal is to adhere to CMoS, then clearly these should not be included in the citation or bibliography as a matter or course."

    Reference managers exist to make things easier, not harder. CMoS is a means to an end, not an end in itself. I am plainly speaking for a great number of people, fans of Zotero all of us -- and we all appreciate the work you've put into this, who are frustrated with the time we have to take manually deleting these DOIs out of our work.

    I figured the easiest way to solve this would be on the front end, through Zotero. Adamsmith wrote: "The field custom export is just something that's a substantial amount of effort to code and design the GUI for with a very limited benefit. The idea of a ref manager is for a user not to have to fiddle with these things." Personally, I much prefer having the ability to fiddle with these things, so long as my fiddling can't have negative consequences on other users. I understand that the effort required to code a customizable export system, and design the GUI such to make it easy to use, is not insignificant. But my guess, and it is just that, is that the number of man-hours users are currently expending "customizing" (=deleting) these records on the back end dwarfs that effort. So the benefit would, I think, actually be considerable. But that is easy for me to assert as I am not the one tasked with coding a custom export system!

    For me, until the ability exists to tinker with the front end, the best solution is probably to write a macro in MSWord that will quickly strip out this unwanted data after it is exported into my documents. Again, I think is a clunky work-around, and probably duplicates what dozens of other people have had to do, but it beats manually deleting fields that I can't order Zotero not to output.

    Thanks again for your help and patience, and keep up the great work on this. I am urging all of my students to adopt Zotero, so please don’t mistake this moaning and groaning for a lack of enthusiasm for you guys!

Sign In or Register to comment.