Machine learning and emoji prediction: How much accuracy can MARBERT achieve?
Mohammed Q. Shormani, Ibrahim Abdulmalik Hassan Muneef Y. Alshawsh · Apr 22, 2026 · Citations: 0
How to use this page
Low trustUse this as background context only. Do not make protocol decisions from this page alone.
Best use
Background context only
What to verify
Validate the evaluation procedure and quality controls in the full paper before operational use.
Evidence quality
Low
Derived from extracted protocol signals and abstract evidence.
Abstract
This study investigates Machine Learning (ML) in the prediction of emojis in Arabic tweets employing the (state-of-the-art) MARBERT model. A corpus of 11379 CA tweets representing multiple Arabic colloquial dialects was collected from X.com via Python. A net dataset includes 8695 tweets, which were utilized for the analysis. These tweets were then classified into 14 categories, which were numerically encoded and used as labels. A preprocessing pipeline was designed as an interpretable baseline, allowing us to examine the relationship between lexical features and emoji categories. MARBERT was finetuned to predict emoji use from textual input. We evaluated the model performance in terms of precision, recall and F1-scores. Findings reveal that the model performed quite well with an overall accuracy 0.75. The study concludes that although the findings are promising, there is still a need for improving machine learning models including MARBERT, specifically for low-resource and multidialectal languages like Arabic.