Data mining is a generous field for researchers due to its various approaches on knowledge discovery in enormous volumes of data that are stored in different formats. At present, data are widely used all over the world, covering areas such as: education, industry, medicine, banking, inssurance companies, research laboratories, business, military domain etc. The major gain from applying data mining techniques is the discovery of unknown patterns and relations between data which can further help in the decision-making processes. There are two forms of data analysis used to extract models by describing important classes or to predict future data trends: classification and prediction. In this paper, the authors present a comparative study of classification algorithms (i.e. Decision Tree, Naïve Bayes and Random Forest) that are currently applied to demographic data referring to death statistics using KNIME Analytics Platform. Our study was based on statistical data provided by the National Bureau of Statistics of the Republic of Moldova corresponding to years 2011 and 2012, data related to deaths and various classification attributes, such as causes of death, areas, sex, years and age groups. A detailed proposal on the possibilities to increase the models’ accuracy was also provided in the paper. Our findings indicated that the highest accuracy was achieved by the Decision Tree model (over 90%).
Irina Ionita, Liviu Ionita