query regarding PubMed translator
Hello,
A few years ago, I created a style to generate an annotated bibliography with abstracts for each reference. I haven't used it in some time, but a colleague who was using it recently noticed that some of the punctuation within the abstract field (specifically the colons after major headings in all caps font) is lost when the data are "grabbed" (sorry, don't know the correct technical term) from the PubMed site and then brought into the Zotero library.
For example (taken from PubMed, PMID: 11109029)
BACKGROUND In the majority of people with familial hypercholesterolaemia (FH) the disorder is caused by a mutation of the low-density lipoprotein receptor gene that impairs its proper function, resulting in very high levels of plasma cholesterol. Such levels result in early and severe atherosclerosis, and hence substantial excess mortality from coronary heart disease. Most people with FH are undiagnosed or only diagnosed after their first coronary event, but early detection and treatment with hydroxymethylglutaryl-coenzyme (HMG CoA) reductase inhibitors (statins) can reduce morbidity and mortality. The prevalence of FH in the UK population is estimated to be 1 in 500, which means that approximately 110,000 people are affected. OBJECTIVES To evaluate whether screening for FH is appropriate. To determine which system of screening is most acceptable and cost-effective. To assess the deleterious psychosocial effects of genetic and clinical screening for an asymptomatic treatable inherited condition. To assess whether the risks of screening outweigh potential benefits. METHODS DATA SOURCES Relevant papers were identified through a search of the electronic databases. Additional papers referenced in the search material were identified and collected. Known researchers in the field were contacted and asked to supply information on unpublished or ongoing studies.
Is there any automated way that I can modify the style to get the punctuation back? Or, would I need to manually add the colons?
Thanks for any advice!
A few years ago, I created a style to generate an annotated bibliography with abstracts for each reference. I haven't used it in some time, but a colleague who was using it recently noticed that some of the punctuation within the abstract field (specifically the colons after major headings in all caps font) is lost when the data are "grabbed" (sorry, don't know the correct technical term) from the PubMed site and then brought into the Zotero library.
For example (taken from PubMed, PMID: 11109029)
BACKGROUND In the majority of people with familial hypercholesterolaemia (FH) the disorder is caused by a mutation of the low-density lipoprotein receptor gene that impairs its proper function, resulting in very high levels of plasma cholesterol. Such levels result in early and severe atherosclerosis, and hence substantial excess mortality from coronary heart disease. Most people with FH are undiagnosed or only diagnosed after their first coronary event, but early detection and treatment with hydroxymethylglutaryl-coenzyme (HMG CoA) reductase inhibitors (statins) can reduce morbidity and mortality. The prevalence of FH in the UK population is estimated to be 1 in 500, which means that approximately 110,000 people are affected. OBJECTIVES To evaluate whether screening for FH is appropriate. To determine which system of screening is most acceptable and cost-effective. To assess the deleterious psychosocial effects of genetic and clinical screening for an asymptomatic treatable inherited condition. To assess whether the risks of screening outweigh potential benefits. METHODS DATA SOURCES Relevant papers were identified through a search of the electronic databases. Additional papers referenced in the search material were identified and collected. Known researchers in the field were contacted and asked to supply information on unpublished or ongoing studies.
Is there any automated way that I can modify the style to get the punctuation back? Or, would I need to manually add the colons?
Thanks for any advice!
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=11109029
so that's a question of import, not of export (so you can't modify your style, no).
Currently we use line-breaks in the abstract after those headers - but those are swallowed in the citation. I'd be open to using (going back to?) colons instead.
Aurimas has done more of the work on the pubmed translator recently - let's see what he says.
Thus, we can get abstract section labels without punctuation and without preceding or trailing spaces.
In process citations, those that are straight from the publisher, are often malformed and will require some adjustments within the translator before the finished abstract will be properly formatted. When PubMed has finished with the early stages of processing, the xml problems will have been corrected.
Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.
Any problems let us know. Thanks to Aurimas for the quick fix.
@DWL - the translator code is quite robust and should do well with malformed abstract, but do let us know if you find problems.
Are you referring to the section headings that are not recognized by PubMed and left in the middle of the paragraph, like here?
Or... ...do you mean that the value of the Label attribute in the AbstractText node from the PubMed's XML output (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=11109029) may contain additional spaces and/or punctuation? In this case, this is pretty easy for us to clean up.
If you're talking about the former, then, in my opinion, I don't think we need to try and parse out the labels from the plain text. If this is good enough to be displayed on PubMed, then it should be decent enough for us to store in Zotero.