Extensions to Peptide Spectrum Match Validation using Machine Learning

Publikation, 2018


G. J. Pirklbauer, S. M. Winkler, K. Mechtler, V. Dorfer - Extensions to Peptide Spectrum Match Validation using Machine Learning - Proceedings of the German Conference on Bioinformatics, Vienna, Österreich, 2018


A substantial part of proteomics mass spectrometry experiments aim at the identification of peptide sequences. This is often achieved by sequence database searching, resulting in peptide-spectrum matches (PSMs). Validation of PSMs is a crucial topic in the community. Attributing confidence to PSMs allows for the retention of only statistically relevant identifications. Searching a target and a decoy database emerged as a practical way of estimating confidence and is universally accepted in the literature [1]. Based on the target-decoy approach, Käll et al. described a statistical framework that allows for the imputation of statistically sound confidence scores [2]. Based on this scoring they developed Percolator, an algorithm which allows boosting the number of confidently identified peptides at an arbitrary false positive rate cutoff [3]. The algorithm relies on a support vector machine and is widely accepted as a standard post-processing procedure in proteomics mass spectrometry experiments. Since the development of Percolator, alternatives to the support vector machine have been developed and brought to maturation. We believe that an increase in the number of PSMs can be achieved combining the ideas of the Percolator algorithm and new machine learning techniques. We utilised random forests [4] in an iterative approach similar to the Percolator algorithm. Compared to the standard target-decoy approach we were able to increase the number of confidently identified PSMs at 1% FDR by 18% on a standard HeLa sample.