UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Topic identification using filtering and rule generation algorithm for textual document

Nurul Syafidah, Jamil (2015) Topic identification using filtering and rule generation algorithm for textual document. Masters thesis, Universiti Utara Malaysia.

[thumbnail of s812431.pdf]

Download (3MB) | Preview
[thumbnail of s812431_abstract.pdf]

Download (1MB) | Preview


Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest
for searching information. Most existing topic identification methods depend on occurrence
of terms in the text. However, not all frequent occurrence terms are relevant. The term
extraction phase in topic identification method has resulted in extracted terms that might have
similar meaning which is known as synonymy problem. Filtering and rule generation
algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to
identify topic for each verse based on the extracted terms. The PFA will process and filter
each sentence based on nouns and predefined keywords to produce suitable terms for the
topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse.

Item Type: Thesis (Masters)
Supervisor : Ku Mahamud, Ku Ruhana and Mohamed Din, Aniza
Item ID: 5379
Uncontrolled Keywords: Topic identification, Filtering algorithm, Rule generation algorithm, Rough Set, Al-Quran verses.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 03 Jan 2016 06:18
Last Modified: 04 Apr 2021 08:54
Department: Awang Had Salleh Graduate School of Arts and Sciences
Name: Ku Mahamud, Ku Ruhana and Mohamed Din, Aniza
URI: https://etd.uum.edu.my/id/eprint/5379

Actions (login required)

View Item
View Item