Scanning text for pmid: #######

fjbuch · October 12, 2011

Is it possible to scan a text document where one has sentences and citations such as this?

The protein was found to exist only between the third and fourth toe {pmid: 12975271}.

I would love the standalone (or anything) to be able to scan that text and compile the citations and bibliography either by comparing to pubmed or by referring to my local zotero database.

adamsmith · October 12, 2011

nope. No one is working on this afaik either.

fjbuch · October 12, 2011

What about other avenues or services. At its simplest one could dump the text of a document against a regexp device, suck out all the unique pmid numbers, feed them as a list to NCBI get the list back. I cannot actually do it myself but if I spent 10 hours on it then maybe I could. But since the only language I know how to program in is R I think it would be a mighty big task for me. Hopefully it would be a little task for someone else.

dstillman · October 12, 2011

Note that Zotero does have an RTF Scan feature that could be extended to support this, but that feature hasn't been worked on in a long time. Outside contributions would certainly be welcome.

ajlyon · October 12, 2011

If you can get a list of PMIDs to query for, Zotero can import the PubMed file that you get from PubMed. As for unique IDs in the file, that's something that you could do in R, or that you could ask someone to help with-- it's a pretty straightforward task to do on the command line.