Machine Learning Approach to Detection of Offensive Language in Online Communication in Arabic

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents the results of several machine learning experiments, conducted with a dataset of YouTube comments in Arabic. The experiments aim at studying the impact of various text preprocessing, feature-extraction and feature-selection techniques on the accuracy of a document classifier for detection of offensive language in online communication in Arabic. Regarding data pre-processing, our experiments focus on filtering out noisy characters and normalising inconsistencies present in casual online writing in Arabic. The combined effect of these data preprocessing techniques and a few feature-extraction and feature-selection methods is then evaluated by training document classifiers. Our results give evidence that it is possible to train a classifier for the detection of offensive language on Arabic social media with reasonable overall accuracy of 0.84, and precision, recall and F1-score of 0.89, 0.76 and 0.81, respectively.

Original languageEnglish
Title of host publication2021 IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages244-249
Number of pages6
ISBN (Electronic)9781665418560
DOIs
Publication statusPublished - 25 May 2021
Event1st IEEE International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021 - Tripoli, Libya
Duration: 25 May 202127 May 2021

Publication series

Name2021 IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021 - Proceedings

Conference

Conference1st IEEE International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2021
Country/TerritoryLibya
CityTripoli
Period25/05/2127/05/21

Keywords

  • Feature Selection
  • Logistic Regression
  • Machine Learning
  • Offensive Language
  • SVM
  • Support Vector Machine

Fingerprint

Dive into the research topics of 'Machine Learning Approach to Detection of Offensive Language in Online Communication in Arabic'. Together they form a unique fingerprint.

Cite this