Probabilistic Assessment of Protein-Protein Interaction Confidence by Large-Scale Analysis of Homologous Protein-Protein Interactions
C. Frech - Probabilistic Assessment of Protein-Protein Interaction Confidence by Large-Scale Analysis of Homologous Protein-Protein Interactions - Master/Diploma Thesis, FH OÖ Fakultät Hagenberg, Austria, 2007, pp. 1-109
The study of Protein-Protein Interactions (PPI) promises to reveal fundamental
molecular mechanisms of cell functions and many diseases. The last decade has
seen a tremendous increase of known PPIs, with hundreds of thousands of them
now being available in public databases. However, it is estimated that about 50% of
reported PPIs are actually false-positives, i. e. experimental artifacts without biological
significance. Reliable verification of PPIs is therefore indispensable and currently
an important topic in bioinformatics.
Motivated by recent insights into PPI evolution, this work proposes a new
homology-based approach to PPI validation. The underlying idea is that most PPIs
originate from genetic duplications and have not evolved de novo between previously
non-interacting proteins. Such an evolutionary relationship between PPIs implies
that formost true-positive PPIs a lot of homologous PPIs exist, within the same
species and within all other.
On top of this assumption, a statistical hypothesis test is formulated and applied on
a large, integrated data set of known PPIs. Under the null hypothesis, i. e. the hypothesis
that a given PPI is a false-positive, the number of PPIs among homologous
proteins is expected to correspond to the number of PPIs among randomly chosen
proteins. If the former number is increased, the null hypothesis is rejected. A P-value
test statistic, the Interaction P-Value (IPV), detects statistically significant results.
The classification performance of the IPV is assessed on three gold standard data
sets, and is compared to two existing homology-based classifiers. At a level of specificity
of 80%, achieved levels of sensitivity range from 76% to 84%.
The statistical analysis of homologous PPIs presented here suggests that homology-based
PPI validation on large, integrated PPI data sets has great potential.