Import from Refworks - PMID information lost?

Hello all
I've been using Refworks for many years but recently decided [read: was forced] to transition to something supported by my institution [read: something free] and decided on Zotero. I think I like it, but I am struggling with a few things. I searched the forum but couldn't find a relevant thread.

One of my issues is that it seems like in the process of exporting from Refworks, or importing into Zotero (I am not sure which), the PMID field of my references got lost! The reason I care about this is that I intend to use the style "National Library of Medicine (grant proposals with PMCID/PMID)".

My questions:
1- Since I saved my Refworks references both as text and as .rwb (their own format I suppose), would there be a way to rescue the missing PMID information from that and either update my Zotero references? I am looking at the .txt backup and I don't see the PMID fields...I am hoping something can be done with the .rwb file!

2- Would it be possible in Zotero to do a "batch PMID search" for all of my ~8,000 references?

Thanks in advance for your help.

Alex
  • Update:
    I figured I that I could get the citations for each of my references, from the backup I made in Refworks. From there, a bit of text parsing (in Excel) allowed me to generate a query I could use in Pubmed Batch Citation Matcher. This allowed me to retrieve about 7300 different PMIDs.
    I thought I could then ask Zotero to import these as 7300 new references, usign the magic wand "Add item(s) by identifier".

    It worked if I input a dozen PMIDs at a time. But I tried with ~100 and ~1000 different PMIDs, and it just seemed to stop after a short while, did not generate any error message or any report on the import job, etc.
    I will post this as a separate thread to see what might be the issue.

    In the meantime, I am still looking for a solution to my original problem.

    Alex
  • If you open whatever format you imported from RW into Zotero in a text editor, could you post the entry corresponding to one item (which did have a PMID in RW) here?
  • Hello, thanks for your reply.
    For the following reference, when I was using Refworks I was able to cite it in a Word document and have the PMID lsited in the bibliography...unfortunately the PMID number is not in the text export (the DOI and PMCID are there, though).
    Alex

    ### start of entry ###
    TY - JOUR
    ID - 6726
    A1 - Liu,Y.
    A1 - Chu,A.
    A1 - Chakroun,I.
    A1 - Islam,U.
    A1 - Blais,A.
    T1 - Cooperation between myogenic regulatory factors and SIX family transcription factors is important for myoblast differentiation
    Y1 - 2010
    Y2 - Nov 1
    VL - 38
    IS - 20
    SP - 6857
    EP - 6871
    AB - Precise regulation of gene expression is crucial to myogenesis and is thought to require the cooperation of various transcription factors. On the basis of a bioinformatic analysis of gene regulatory sequences, we hypothesized that myogenic regulatory factors (MRFs), key regulators of skeletal myogenesis, cooperate with members of the SIX family of transcription factors, known to play important roles during embryonic skeletal myogenesis. To this day little is known regarding the exact molecular mechanism by which SIX factors regulate muscle development. We have conducted a functional genomic study of the role played by SIX1 and SIX4 during the differentiation of skeletal myoblasts, a model of adult muscle regeneration. We report that SIX factors cooperate with the members of the MRF family to activate gene expression during myogenic differentiation, and that their function is essential to this process. Our findings also support a model where SIX factors function not only 'upstream' of the MRFs during embryogenesis, but also 'in parallel' to them during myoblast differentiation. We have identified new essential nodes that depend on SIX factor function, in the myogenesis regulatory network, and have uncovered a novel way by which MRF function is modulated during differentiation.
    N1 - JID: 0411011; OID: NLM: PMC2978361; 2010/07/02 [aheadofprint]; ppublish
    CY - England
    JF - Nucleic acids research
    JA - Nucleic Acids Res.
    SN - 1362-4962; 0305-1048
    AD - Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Faculty of Medicine, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada.
    DO - 10.1093/nar/gkq585
    M1 - Journal Article
    ER -

    ### end of entry ###
  • Do you still have access to RefWorks? If so, can you export as RefWorks Tagged?
  • Unfortunately, I don't...I really should have checked all this before my access got cancelled....mea culpa.

    From PMIDs, I was able to get Pubmed XML format list for all my 7000 references. I then converted to .bib format using pubmed2bibtex.xsl
    The conversion went apparently fine (see one reference, below).
    Importing into Zotero works, but the PMID info is skipped!
    Should I be using a different converter?
    Thanks
    Alex


    ### example of Pubmed.xml converted to .bib

    @article{pmid26691984,
    title = {{Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations.}},
    author = {Carlos L Araya and Can Cenik and Jason A Reuter and Gert Kiss and Vijay S Pande and Michael P Snyder and William J Greenleaf},
    journal = {{Nat. Genet.}},
    volume = {48},
    number = {2},
    year = {2016},
    month = feb,
    pages = {117-25},
    abstract = {Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼ 15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.},
    pii = {ng.3471},
    doi = {10.1038/ng.3471},
    pubmed = {26691984},
    pmc = {PMC4731297},
    mid = {NIHMS740737},
    nlmuniqueid = {9216904}
    }


    #### end of example
  • Zotero should be able to import PubmedXML directly. Have you tried that? (pubmed and pmc aren't official bibtex fields, so we don't currently import them, though it'd be easy to add.)
  • However, the fields pmid and pmcid in BibTeX should already work. Try it out and then you can search & replace in your large BibTeX file before importing it to Zotero.
  • Hello
    Thanks both for your help.
    I tried to import the large pubmed XML file, and also a test file with a single reference. They both fail, Zotero giving me a "The selected file is not in a supported format." message...

    Is this where I am supposed to make an official feature request, for full Pubmed XML support? Alternately, it would be great if someone would develop a platform-independent program to convert pubmed XML to bibtex (all options I found run on Linux/Unix, or don't tolerate large queries...)

    zuphilip: your suggestion worked (at least with a tiny test file with 1 record)! I changed the .bib file "pubmed" to "PMID", and "pmc" to "PMCID" and this got imported properly. I'll try modifying the entire set of 7000 refs and upload. I will report back.

    Alex
  • I'm surprised that's not working -- could you walk me through how you got the XML file? There is full pubmed XML support (that's what Zotero uses when you import a pubmed ID or directly from pubmed), the problem is likely just that Zotero doesn't recognize the file as such for some reason (which is a bug, not a feature, so no request needed ;)
  • Hello,
    Here is the process from the beginning:
    1-Refworks exported text file parsed in Excel (Perl would have been more efficient but much longer for me to code) to retrieve the information needed by Pubmed's Batch Citation Matcher:
    journal_title|year|volume|first_page|author_name|your_key|

    2-This returned a similar list, with PMID appended at the end of each line. I extracted the list of ~7,000 PMIDs and entered them directly in the search box of pubmed's search page (was expecting an error message for size of query, but it went fine)

    3-I saved the results for citation manager (XML) to a text file: citations.xml

    4-I moved to our Ubuntu box and downloaded pubmed2bibtex.xsl form the website: http://www.genomearchitecture.com/static/misc/pubmed2bibtex.xsl

    5-I issued the command:
    xsltproc pubmed2bibtex.xsl citations.xml > citations.bib

    6-I opened "citationx.bib" in a text editor, and replaced all instances of "pubmed :" with "PMID :", and all instances of "pmc :" with "PMCID :"
    (the number of changes matched the number of PMID that I started with, that's as far as my QC goes...)

    7-I used the "import" function in Zotero.

    I am confirming that this worked and I got by ~7,000 references in!

    Thanks again
    Alex
  • ...I should have mentioned that usage of the xsl script was explained here, and this really helped me:
    http://www.genomearchitecture.com/2014/06/how-to-convert-pubmed-references-to-bibtex
  • I know you have this working, but this should be much easier, so if you don't mind working with me for a sec to figure out where this fails:
    3-I saved the results for citation manager (XML) to a text file: citations.xml
    I can't get here. When I use "Send to-->Citation Manager" Pubmed exports in .nbib which is a flat format.
    When I use "Send to --> File --> XML" I get the pubmed format that Zotero can read. You seem to get something very similar to the latter, but not quite the same? Can you get me to a simple search on pubmed and tell me exactly what you click to get citations.xml?
  • Hi, good catch, I was not precise enough.

    "Send to-->citation manager" has a cap of 200 articles that can be processed at a time. Since I have >7000 I don't want to repeat the process 35 times and reassemble everything. So instead, I chose to use:
    choose destination --> File && format: XML

    Zotero starts importing that XML file, but crashes repeatedly (I tried again just 2 seconds ago, and it crashed again... Note that the file is about 100 Mb in size). I should have been more precise and indicate this fact: Zotero IS able to read my Pubmed XML, but was unable to deal elegantly with my huge file...

    Alex
  • Instead the .bib file is only 12 Mb in size: no crash during importing.
  • Sorry for another tidbit I should have added before:

    getting repeated Firefox crashes on importing the huge XML file, I decided to run a test with a tiny xml file made up of 1 entry only. Not being familiar with the format, the tiny file I created was incomplete (was missing the first 3 lines of the XML as well as the very last line). This tiny, corrupted xml file was prompting Zotero to give me the "unsupported file format" message. My mistake, and wrong interpretation on my part.

    I suppose that the bottom line is the following: for a single or about a dozen PMID, the "Add by identifier" is fine. For somewhat larger list, "import..." of Pubmed XML files works well. For huge reference lists like mine, the much smaller file .bib format makes sense and may be a workaround if crashes keep happening.

    Alex
  • OK. I think the next version of Zotero might be able to handle larger files better, but I see that importing a 100MB XML could cause problems.
    Just one last question: did it crash as in shut down Firefox or did it freeze with non-responding?
  • It shut down Firefox.
  • OK, thanks -- that'll be very helpful going forward.
Sign In or Register to comment.