Trawling for Drug-Gene Relationships

Database automatically mines literature for drug-gene relationships--and does it as well as manually curated databases.

When a drug saves one person but makes another ill, a bitter lesson in genetic differences often follows. With many such lessons already under our collective belts, researchers are using existing knowledge to predict additional drug-gene relationships as a way to forestall future calamities. A new software program can trawl published papers for gene-drug relationships, plug those relationships into known genetic networks, and predict which genes are likely to affect a patient’s response to a drug.


The text-mining-based version of PGxPipeline automatically dissects journal articles into component sentences and marks where a drug or a gene is mentioned. Reading the sentence syntax and vocabu- lary, it tracks the interactions between drugs and genes. A network/web of inter- actions is established (bottom), in which the thickness of each edge corresponds to the number of articles that support the interac- tion. The web of relationships is later enhanced using a database of gene-gene interactions and other information. Image reprinted from Garten, Y., Tatonetti, N., & Altman, R., Improving the prediction of pharmacogenes using text-derived drug- gene relationships, Pacific Symposium on Biocomputing, Hawaii, January 2010.“Our contribution is using text mining and taking decades of research and folding that in to inform the prediction,” says Yael Garten, biomedical informatics PhD candidate in the lab of Russ B. Altman, MD, PhD, at Stanford University and a lead author of the work. “We showed that this is as good as and sometimes even better than manual curation,” in which scientists painstakingly enter published drug-gene interactions into a database. Garten will present the team’s research January 2010 at the Pacific Symposium on Biocomputing in Hawaii.


The previous version of the algorithm, designed by Altman and others, relied more heavily on manual labor. Called PGxPipeline, it employed a database of gene-drug relationships manually compiled from scientific articles by a team of scientists at Stanford Medical School. PGxPipeline wove these relationships into an orderly web, along with a database of gene-gene interactions and other data, to predict how strongly each of 12,460 genes affects response to a specific drug.


The team has now cut PGxPipeline loose from the manually created drug-gene database, automatically mining the information from published papers. This faster, cheaper method will inform the drug-gene rankings with constant updates from new literature. The manual-curation- and text-mining-based versions of PGxPipeline predicted with similar accuracy a test set of 682 drug-gene interactions. And the text-mining-based version was slightly better at identifying genes that play the largest roles in response to a specific drug.


Garten hopes to use the revised PGxPipeline to parse all relevant scientific literature for drug-gene relationships. Better predictions will save researchers time in deciding which of the possible interactions to test in the lab and eventually influence how doctors prescribe drugs, she maintains.


“There is an emerging trend in bioinformatics to combine information from curated databases with information extracted from text,” says Tom Rindflesch, PhD, principal investigator for the semantic knowledge representation project at the National Institutes of Health in Bethesda, Maryland. “This is an excellent example.”

Post new comment

The content of this field is kept private and will not be shown publicly.
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.