On the Performance of Evolutionary Algorithms in Biomedical Keyword Clustering


V. Dorfer, S. M. Winkler, T. Kern, S. Blank, G. Petz, P. Faschang - On the Performance of Evolutionary Algorithms in Biomedical Keyword Clustering - Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2011, Dublin, Irland, 2011


In the field of life sciences it often turns out to be a challenge to quickly find the desired information due to the huge amount of available data. The research area of information retrieval (IR) addresses this problem and tries to provide suitable solutions. One of the approaches used in IR is query extension based on keyword or document clusters. In this paper we present a deep analysis of a keyword clustering approach using four different kinds of evolutionary algorithms, namely evolution strategy (ES), genetic algorithm (GA), genetic algorithm with strict offspring selection (OSGA), and the multi-objective elitist non-dominated sorting genetic algorithm (NSGA-II). We have identified features that characterize solution candidates for the keyword clustering problem, e.g., the number of documents covered and how well the identified clusters of keywords match with the occurrence of keywords in the given set of documents. The use of these features and how evolutionary algorithms can be used to solve the optimization of keyword clusters is shown in this paper. To test the here presented approach we used a real world data set provided within the TREC-9 conference; this data collection includes information about approximately 36,000 documents collected from the PubMed database. In the results section we compare the performance of the here tested evolutionary algorithms and see that especially ES and NSGA-II produce meaningful results for this documents collection. This approach based on evolutionary algorithms shall be used further on in automated query extension for biomedical information retrieval in PubMed.