Abstract
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.
| Original language | English |
|---|---|
| Pages (from-to) | 315-320 |
| Number of pages | 6 |
| Journal | Procedia Computer Science |
| Volume | 142 |
| DOIs | |
| Publication status | Published - 2018 |
| Event | 4th Arabic Computational Linguistics, ACLing 2018 - Dubai, United Arab Emirates Duration: 17 Nov 2018 → 19 Nov 2018 |
Keywords
- Anti-social behaviour online
- Arabic dataset
- SVM for offensive language detection in Arabic
- harassment detection
- offensive language detection
- text mining
Fingerprint
Dive into the research topics of 'Towards Accurate Detection of Offensive Language in Online Communication in Arabic'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver