EthnoHate2 ( Ethnic Hate Speech Prediction in Social Media Texts )
Project leader: Olessia Koltsova
Project participans: Anton Surkov
The project "Detecting Ethnic Conflict in Social Media Using Transformers and Data Augmentation Methods" continues the study from 2020 using the same Russian-language corpus of user texts on ethnic relations and aims to automatically identify mentions, discussions, and verbal participation in ethnic conflicts through fine-tuning of pre-trained transformer-encoders (RuBERT, RuROBERTa, etc.) and various augmentation techniques, including generation of alternative formulations by large language models and the augmentation technique proposed by the authors - random replacement of ethnonyms, which eliminates overfitting on rare ethnonyms. This very technique in combination with fine-tuned RuROBERTa yielded the best result F1-macro = 0.80 and demonstrated resistance to targeted adversarial attack (superiority of ≈ 0.05 for the target class over the base model), which indicates the potential of the proposed approach for problems where models tend to rely on randomly correlated markers, and the use of LLM generation and auto-labeling reduces the need for manual data annotation.
Publications on the project:
Surkov A., Koltsova O. Detecting Ethnic Conflict in Social Media with Transformers and Augmented Data // Procedia Computer Science, 2025, Volume 258, Pages 2382-2390, ISSN 1877-0509DOI
Keywords: Ethnic conflict detection; social media; LLM; Fine Tuning; Data Augmentation; Russian language
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.