The article by Vera Ignatenko, Anton Surkov, and Sergey Koltsov has been accepted for publication in the journal "PEERJ Computer Science"

The second article, developed within the QTM project "Enhancing the Methodology of Automatic Text Analysis Based on Topic Modeling," on the topic "Random Forests with Parametric Entropy-Based Information Gains for Classification and Regression Problems," will be published.

The Random Forest algorithm is one of the most popular and frequently used algorithms for classification and regression tasks. It combines the output of several decision trees to obtain a unified result. Random Forest demonstrates the highest accuracy in tabular data compared to other algorithms across various applications. However, Random Forests, specifically decision trees, are typically built using classical Shannon entropy.

In this article, we explore the possibilities of deformed entropies, which are successfully utilized in the field of complex systems, to enhance the prediction accuracy of Random Forest algorithms. We develop and introduce information gains based on Rényi, Tsallis, and Sharma-Mittal entropies for both classification and regression Random Forests. We test the proposed modifications on six benchmark datasets: three for classification tasks and three for regression tasks. For classification, the application of Rényi entropy improves Random Forest prediction accuracy by 19-96% depending on the dataset, Tsallis entropy enhances accuracy by 20-98%, and Sharma-Mittal entropy increases accuracy by 22-111% compared to the classical algorithm. For regression tasks, the use of deformed entropies improves prediction by 2-23% in terms of R^2 depending on the dataset.

Date

5 December 2023

Authors

Vera Ignatenko
Research Fellow

Sergei Koltsov
Leading Research Fellow

Anton Surkov
Research Assistant

Keywords

publications research projects

About

Laboratory for Social and Cognitive Informatics