UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Enhanced ontology-based text classification algorithm for structurally organized documents

Oleiwi, Suha Sahib (2015) Enhanced ontology-based text classification algorithm for structurally organized documents. PhD. thesis, Universiti Utara Malaysia.

[thumbnail of s91731.pdf]

Download (2MB) | Preview
[thumbnail of s91731_abstract.pdf]

Download (545kB) | Preview


Text classification (TC) is an important foundation of information retrieval and text
mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on
the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV)
and Structure Feature Vector (SFV), create feature vector to represent the document.
The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related
set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC.

Item Type: Thesis (PhD.)
Supervisor : Yasin, Azman and Mahat, Nor Idayu
Item ID: 5358
Uncontrolled Keywords: Text classification, ontology, structural, structured documents
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Computer software > QA76.76 Fuzzy System.
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 29 Dec 2015 10:37
Last Modified: 18 Mar 2021 08:38
Department: Awang Had Salleh Graduate School of Arts and Sciences
Name: Yasin, Azman and Mahat, Nor Idayu
URI: https://etd.uum.edu.my/id/eprint/5358

Actions (login required)

View Item
View Item