We therefore propose to develop a set of novel identification algorithms that are specifically designed for the analysis of modern mass spectra and incorporate multiple sources of information in the here proposed bioinformatics research project.
Preliminary research results are promising: The project consortium consisting of the Proteomics Group at IMP Vienna and the Bioinformatics Research Group at FH OÖ (Campus Hagenberg) has already conducted successful joint research in the analysis of MS data: Identification rates comparable or even superior to Mascot, the current gold-standard, have been achieved using a first version of a scoring function designed by the proposing consortium.
Encouraged by these preliminary research results, we are convinced that considering additional sources of information will further improve identification rates of mass spectra – therefore this project is dedicated to research on a combination of the following novel approaches: We plan to use machine learning techniques to analyze peptide elution times, fragmentation patterns and mass accuracy characteristics specific to the instrument; in addition, observed m/z values will be recalibrated based on the mass error of highly reliable identifications, and the remaining mass error with regard to the learned distribution will be incorporated into the scoring function. Sophisticated peak picking strategies will also be designed using machine learning. These improvements will help increase identification rates in challenging situations such as hybrid spectra and exhaustive searches for a wide range of post-translational modifications. The latter approach leads to exponentially growing search spaces and an accompanying drop in spectra identification rates because the information in MS spectra on its own is not sufficient to cope with the increased search space. Instead of applying brute force methods we plan to solve this problem using construction heuristics, i.e., evolutionary algorithms that realize intelligent search strategies for large numbers of unknown post-translational modifications based on a combination of database search and de novo identification.
All research results achieved in this project will be published and made freely available to the bioinformatics and proteomics communities. Improving identification rates of peptides in general and of unknown modifications in particular will permit a deeper insight into the proteome; computer science shall thus form a new basis for finding answers to important medical and biological questions.
FWF Translational Research
Das Projekt wird im Translational Research Programm durch den FWF Wissenschaftsfonds gefördert.