UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter

Ihnaini, Baha' Najim Salman (2019) An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter. Doctoral thesis, Universiti Utara Malaysia.

[thumbnail of s900147_01.pdf] Text
s900147_01.pdf

Download (2MB)
[thumbnail of s900147_02.pdf] Text
s900147_02.pdf

Download (745kB)
[thumbnail of s900147_references.docx] Text
s900147_references.docx

Download (80kB)

Abstract

Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis.

Item Type: Thesis (Doctoral)
Supervisor : Mahmuddin, Massudi
Item ID: 8699
Uncontrolled Keywords: Arabic sentiment analysis, Palestinian dialect lexicon, Lexicon-based approach, Valence shifter rules, Twitter.
Subjects: T Technology > T Technology (General)
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 05 Oct 2021 00:17
Last Modified: 16 Feb 2022 02:08
Department: Awang Had Salleh Graduate School of Arts & Sciences
Name: Mahmuddin, Massudi
URI: https://etd.uum.edu.my/id/eprint/8699

Actions (login required)

View Item
View Item