Evolutionary Identification of Cancer Predictors Using Clustered Data


S. M. Winkler, M. Affenzeller, H. Stekel - Evolutionary Identification of Cancer Predictors Using Clustered Data - Companion Publication of the 2013 Genetic and Evolutionary Computation Conference, GECCO'13 Companion, Amsterdam, Niederlande, 2013, pp. 1463-1470


In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients’ data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.