UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Enhanced ontology-based text classification algorithm for structurally organized documents

Oleiwi, Suha Sahib (2015) Enhanced ontology-based text classification algorithm for structurally organized documents. PhD. thesis, Universiti Utara Malaysia.

[img] Text
Restricted to Registered users only

Download (2MB)

Download (545kB) | Preview


Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC.

Item Type: Thesis (PhD.)
Uncontrolled Keywords: Text classification, ontology, structural, structured documents
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Computer software > QA76.76 Fuzzy System.
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Depositing User: Mr. Badrulsaman Hamid
Date Deposited: 29 Dec 2015 10:37
Last Modified: 24 Apr 2016 04:55
URI: http://etd.uum.edu.my/id/eprint/5358

Actions (login required)

View Item View Item