Abstract

We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.

Original languageEnglish
Pages (from-to)315-320
Number of pages6
JournalProcedia Computer Science
Volume142
DOIs
Publication statusPublished - 2018
Event4th Arabic Computational Linguistics, ACLing 2018 - Dubai, United Arab Emirates
Duration: 17 Nov 201819 Nov 2018

Keywords

  • Anti-social behaviour online
  • Arabic dataset
  • SVM for offensive language detection in Arabic
  • harassment detection
  • offensive language detection
  • text mining

Fingerprint

Dive into the research topics of 'Towards Accurate Detection of Offensive Language in Online Communication in Arabic'. Together they form a unique fingerprint.

Cite this