Translator for meetinglibrary.asco.org

adamsmith · January 23, 2014

I've looked at clinicaltrials.gov but wouldn't really know what to do with it. If you take a sample entry like:
http://clinicaltrials.gov/ct2/show/record/NCT00001372
and tell me what should go where in Zotero, the government sites are all very well done in terms of data access, so writing the actual translator isn't that hard.

Graham_MTM · January 23, 2014

@asinclair

Yes! very much. I would love to get something like this working. I spent a very small amount of time on this late last year, but I stopped because (apart from not being able to devote a lot of time to it) I found that the Zotero fields didn't quite fit with the data I wanted to scrape from clinicalTrials.gov. Whereas with an ASCO abstracts page, I was interested in capturing the standard citation info + abstract, with clinicalTrials.gov I'd actually like to scrape all of the available info and export it into an excel spreadsheet so I could get quantitative info on, for example, all currently recruiting trials in multiple myeloma. I'd want to grab fields like # of estimated patients, phase of trial, institutions, etc. Unlike a standard scraper, I wasn't sure how to get this into zotero. I began to think that perhaps Zotero wasn't the best tool to use at all. Perhaps one of the devs could weigh in on this?

I will add though- i am able to do a very usable excel-style output from zotero by using a custom CSL format that puts a delimiter (*) between all of the fields. I then use the export bibliography" zotero function and paste into excel, then separate the fields using the "text to columns" function in excel by splitting on the delimiter. If I could import the fields from ClinicalTrials.gov into dummy citation fields and export this way...it could be workable.

adamsmith · January 23, 2014

I began to think that perhaps Zotero wasn't the best tool to use at all. Perhaps one of the devs could weigh in on this?

not it's not. Probably best to download the XML directly and work with that. http://clinicaltrials.gov/ct2/resources/download#DownloadMultipleRecords

Graham_MTM · January 23, 2014

@adamsmith

I think the quickest approach would be something like I mentioned above, where the obvious fields could go into the citations (title, study author, etc) but then there would have to be some sort of a cheat-sheet to know what the rest of the info was. For example, scrape the "phase" data into "journal name" and then change the field title when you export it? This seems clunky.

Another option would be to create an entirely new type of document "Clinical Trial" and tabulate the info from the page into the correct fields. I don't know how to do this, or how hard it might be. Can this be done within Scaffold, or is this something that needs backend support?

edit: I just read your last post and checked it out for myself. It seems that there is even a direct XML or CSV download option for search results (not sure if this is new, or if I just missed it before). This would make excel data importing pretty simple.

@asinclair - would this be a good solution for you? or would you need some citation info in a citation manager for referencing?

@adamsmith
would it be possible to either 1)get a citation-only scraper working for ClinicalTrials.gov or 2)create Zotero entries from an XML or excel file? Actually- I can think of lots of ways that being able to do the latter would be potentially helpful. Is this possible?

adamsmith · January 23, 2014

I think the quickest approach would be something like I mentioned above, where the obvious fields could go into the citations (title, study author, etc) but then there would have to be some sort of a cheat-sheet to know what the rest of the info was. For example, scrape the "phase" data into "journal name" and then change the field title when you export it? This seems clunky.

not going to happen. If this can be imported in a way that's useful for citations I'll be happy to help, but we're not going to add a translator that puts information into random fields that require a cheat sheet.

Another option would be to create an entirely new type of document "Clinical Trial" and tabulate the info from the page into the correct fields. I don't know how to do this, or how hard it might be. Can this be done within Scaffold, or is this something that needs backend support?

the latter. And I doubt it's going to happen: Zotero can't function as a general data analysis tool. This would stretch what it's meant to do too far. See above - you just want to go straight from XML to Excel. There are many tutorials on the web for that. I don't know where Excel is with that, but LibreOffice has a data import wizard for XML. I assume so does Excel.

adamsmith · January 23, 2014

would it be possible to either 1)get a citation-only scraper working for ClinicalTrials.gov or 2)create Zotero entries from an XML or excel file? Actually- I can think of lots of ways that being able to do the latter would be potentially helpful. Is this possible?

we're overlapping a lot here ;) - 1) definitely - but which citations? And what should go where?
2) doesn't really make sense: CSV or XML are very generic formats. E.g. Zotero already has at least four different XML importers (MODS, RDF, PubmedXML, MarcXML). Import needs to be defined for each XML standard separately and I don't see a good match between Zotero and the clinicaltrials XML format.

Graham_MTM · January 23, 2014

@adamsmith

>the latter. And I doubt it's going to happen

I assumed as much. I agree, XML import is much better. I edited my previous post as well, but is there a way to get excel fields back into zotero? for example, if I pulled the info I wanted from the raw XML, and arranged it into columns that are appropriate for Zotero (author, title, URL, etc) can I somehow export that from excel into an XML that is the correct format? I'm treading into deep waters here, since my knowledge of XML ends by knowing that it exists :)

Graham_MTM · January 23, 2014

@ adamsmith

>we're overlapping a lot here ;)

yes, we are :)

adamsmith · January 23, 2014

For Excel--> Zotero: For reasons specified above, this isn't just going to be an import translator, but you should be able to get this working with relative ease for any given Excel spreadsheet:
https://forums.zotero.org/discussion/25120/importing-from-excel-to-zotero/

Graham_MTM · January 23, 2014

cool, thanks. I'll leave this link here too in case anyone is interested

http://office.microsoft.com/en-us/excel-help/export-xml-data-HP010206401.aspx

I'll play with this and let you know how it goes. Cheers Adam! it's always a pleasure.

asinclair · January 23, 2014

I should have said 'citation data' shouldn't I (although the discussion does suggest possibilities for my very newly acquired knowledge of sufficient Python to speak to the API).

But no, I was interested in being able to collect the citation data to produce citations per the recommended style - https://www.nlm.nih.gov/services/ctcite.html ...

(quote)

Author; Author. Title. In: ClinicalTrials.gov [Internet]. Bethesda (MD): National Library of Medicine (US). 2000- [cited date]. Available from: URL of the record NLM Identifier: NCTXXXXXXXX.

[Note: The authors are the sponsors of the study and the title is the name of the study. The "cited date" is the date you accessed the ClinicalTrials.gov record. Dates are in the format of YYYYMMMDD. The NLM Identifier is at the bottom of the record under "More Information."]

(end quote)

So the information that would need to be collected would be (by XML tags)

Sponsors (lead sponsor/agency, collaborator/agency) -> authors
Official title -> title
ClinicalTrials.gov
Bethesda (MD)
(date access) -> Accessed
(URL) -> URL
id_info/nct_id -> NCT Identifier (I don't know where that should go - if it goes in an extra field, then any reference of the same type will have to have that field cleared)

Thank you.

adamsmith · January 23, 2014

that's relatively easy. It won't happen super quickly, but I'll put in on my list.

dcnorris · August 12, 2017

Taking as my guide https://www.ncbi.nlm.nih.gov/books/NBK7273/#A57896, and following adamsmith's 2013 suggestion that the Book Section format might suffice, I can suggest the following as a temporizing measure:

Item Type: Book Section
Title: ALTA-1L: A Phase 3 Study of Brigatinib Versus Crizotinib in ALK-positive Advanced Non-Small Cell Lung Cancer Patients [last updated: 2017 Aug 10]
(NB: I have manually appended the 'Last updated' field to the title in brackets)
Author: Ariad Pharmaceuticals
...
Series: ClinicalTrials.gov [Internet]
(NB: Entering this as the 'Series' rather than 'Title' avoids italics in citation)
...
Place: Bethesda, MD
Publisher: U.S. National Library of Medicine
Date: August 10, 2017
...
URL: https://clinicaltrials.gov/ct2/show/NCT02737501
Accessed: 8/12/2017

The citation drags-n-drops as follows:
1. Ariad Pharmaceuticals. ALTA-1L: A Phase 3 Study of Brigatinib Versus Crizotinib in ALK-positive Advanced Non-Small Cell Lung Cancer Patients [last updated: 2017 Aug 10]. In: ClinicalTrials.gov [Internet]. Bethesda, MD: U.S. National Library of Medicine; 2017. https://clinicaltrials.gov/ct2/show/NCT02737501. Accessed August 12, 2017.

Admittedly, the citation does not list separately the "ClinicalTrials.gov Identifier: NCT02737501", but this is anyway obvious from the URL.

HTH