query regarding PubMed translator

Hello,

A few years ago, I created a style to generate an annotated bibliography with abstracts for each reference. I haven't used it in some time, but a colleague who was using it recently noticed that some of the punctuation within the abstract field (specifically the colons after major headings in all caps font) is lost when the data are "grabbed" (sorry, don't know the correct technical term) from the PubMed site and then brought into the Zotero library.

For example (taken from PubMed, PMID: 11109029)

BACKGROUND In the majority of people with familial hypercholesterolaemia (FH) the disorder is caused by a mutation of the low-density lipoprotein receptor gene that impairs its proper function, resulting in very high levels of plasma cholesterol. Such levels result in early and severe atherosclerosis, and hence substantial excess mortality from coronary heart disease. Most people with FH are undiagnosed or only diagnosed after their first coronary event, but early detection and treatment with hydroxymethylglutaryl-coenzyme (HMG CoA) reductase inhibitors (statins) can reduce morbidity and mortality. The prevalence of FH in the UK population is estimated to be 1 in 500, which means that approximately 110,000 people are affected. OBJECTIVES To evaluate whether screening for FH is appropriate. To determine which system of screening is most acceptable and cost-effective. To assess the deleterious psychosocial effects of genetic and clinical screening for an asymptomatic treatable inherited condition. To assess whether the risks of screening outweigh potential benefits. METHODS DATA SOURCES Relevant papers were identified through a search of the electronic databases. Additional papers referenced in the search material were identified and collected. Known researchers in the field were contacted and asked to supply information on unpublished or ongoing studies.

Is there any automated way that I can modify the style to get the punctuation back? Or, would I need to manually add the colons?

Thanks for any advice!
  • PubMED doesn't export colons there:
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=11109029
    so that's a question of import, not of export (so you can't modify your style, no).
    Currently we use line-breaks in the abstract after those headers - but those are swallowed in the citation. I'd be open to using (going back to?) colons instead.
    Aurimas has done more of the work on the pubmed translator recently - let's see what he says.
  • edited March 7, 2013
    This is made more complicated by the spotty implementation of NLM's "structured abstracts" model. (see http://structuredabstracts.nlm.nih.gov/Implementation.shtml) Often even the major journal publishers provide metadata that does not meet the specifications.

    Thus, we can get abstract section labels without punctuation and without preceding or trailing spaces.
  • Currently we use line-breaks in the abstract after those headers - but those are swallowed in the citation.
    Either colons or newlines for separating section headings from text are fine by me, but DWL-SDCA's link suggests that colons are the way to go:
    The NLM guideline for reconstructing Structured Abstracts for display as a single paragraph is:

    Reinsert the wording found in the Label= attribute followed by a colon space before the AbstractText data, except when that attribute value is UNLABELLED. In that case, ignore the attribute data and display only the AbstractText data on its own line at the margin. Insert one space before the next label.

    The NLM guideline for reconstructing Structured Abstracts for an easier-to-read, multi-paragraph display is:

    Reinsert the wording found in the Label= attribute followed by a colon space before the AbstractText data, except when that attribute value is UNLABELLED. In that case, ignore the attribute data and display only the AbstractText data on its own line at the margin. Typically, the label followed by the AbstractText displays on a new line with the label and the colon in bold with one or more blank lines between the sections/segments.
  • To make my point better than I did above --
    In process citations, those that are straight from the publisher, are often malformed and will require some adjustments within the translator before the finished abstract will be properly formatted. When PubMed has finished with the early stages of processing, the xml problems will have been corrected.
  • The translator fix for PUBMED is now up and the translator will insert colon-space instead of newlines after the section headers.

    Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.

    Any problems let us know. Thanks to Aurimas for the quick fix.

    @DWL - the translator code is quite robust and should do well with malformed abstract, but do let us know if you find problems.
  • I'm still not sure what problems you are referring to exactly. Do you perhaps have an example of this?

    Are you referring to the section headings that are not recognized by PubMed and left in the middle of the paragraph, like here?

    Or...
    Thus, we can get abstract section labels without punctuation and without preceding or trailing spaces.
    ...do you mean that the value of the Label attribute in the AbstractText node from the PubMed's XML output (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=11109029) may contain additional spaces and/or punctuation? In this case, this is pretty easy for us to clean up.

    If you're talking about the former, then, in my opinion, I don't think we need to try and parse out the labels from the plain text. If this is good enough to be displayed on PubMed, then it should be decent enough for us to store in Zotero.
  • Many thanks for the rapid turnaround and consideration of my request.
Sign In or Register to comment.