TY - GEN
T1 - Machine Learning Approach to Detection of Offensive Language in Online Communication in Arabic
AU - Alakrot, Azalden
AU - Fraifer, Muftah
AU - Nikolov, Nikola S.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5/25
Y1 - 2021/5/25
N2 - This paper presents the results of several machine learning experiments, conducted with a dataset of YouTube comments in Arabic. The experiments aim at studying the impact of various text preprocessing, feature-extraction and feature-selection techniques on the accuracy of a document classifier for detection of offensive language in online communication in Arabic. Regarding data pre-processing, our experiments focus on filtering out noisy characters and normalising inconsistencies present in casual online writing in Arabic. The combined effect of these data preprocessing techniques and a few feature-extraction and feature-selection methods is then evaluated by training document classifiers. Our results give evidence that it is possible to train a classifier for the detection of offensive language on Arabic social media with reasonable overall accuracy of 0.84, and precision, recall and F1-score of 0.89, 0.76 and 0.81, respectively.
AB - This paper presents the results of several machine learning experiments, conducted with a dataset of YouTube comments in Arabic. The experiments aim at studying the impact of various text preprocessing, feature-extraction and feature-selection techniques on the accuracy of a document classifier for detection of offensive language in online communication in Arabic. Regarding data pre-processing, our experiments focus on filtering out noisy characters and normalising inconsistencies present in casual online writing in Arabic. The combined effect of these data preprocessing techniques and a few feature-extraction and feature-selection methods is then evaluated by training document classifiers. Our results give evidence that it is possible to train a classifier for the detection of offensive language on Arabic social media with reasonable overall accuracy of 0.84, and precision, recall and F1-score of 0.89, 0.76 and 0.81, respectively.
KW - Feature Selection
KW - Logistic Regression
KW - Machine Learning
KW - Offensive Language
KW - SVM
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=85113667658&partnerID=8YFLogxK
U2 - 10.1109/MI-STA52233.2021.9464402
DO - 10.1109/MI-STA52233.2021.9464402
M3 - Conference contribution
AN - SCOPUS:85113667658
T3 - 2021 IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021 - Proceedings
SP - 244
EP - 249
BT - 2021 IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021
Y2 - 25 May 2021 through 27 May 2021
ER -