UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Data Mining Classification Techniques and Performances on Medical Data

Benyehmad, Yahyia Mohammed M. Ali (2006) Data Mining Classification Techniques and Performances on Medical Data. Masters thesis, Universiti Utara Malaysia.

[img] PDF
Yahyia_Mohammed_M._Ali_Benyehmad_-_Data_mining_classification_techniques_and_performances_on_medical_data.pdf
Restricted to Registered users only

Download (1MB)
[img]
Preview
PDF
Yahyia_Mohammed_M._Ali_Benyehmad_-_Data_mining_classification_techniques_and_performances_on_medical_data.pdf

Download (175kB) | Preview

Abstract

This study evaluates the performance of classification techniques with the application of several software, among them are Rosetta, Tanagra, Weka and Orange. The classification technique has been tested on six medical datasets from the UCI Machine Learning Repository. The study will help researchers to select the best suitable technique of classification problem for medical datasets in term of classification accuracy. In this thesis, sixteen classification techniques have been evaluated and compared. These are Radial Basis Function (RBF), Multilayer Perceptron (MLP) Neural Networks, Multi Linear Regression (MLR), Logistic Regression (LR), Classification Tree (ID3, C4.5, 548, CART), Naive Bayes (NB), Support Vector Machines (SVM), k- Nearest Neighbors (kNN), Linear discriminate analysis (LDA),Rule based classifier, Standard voting, Voting with object tracking and Standard tuned voting (RSES). The experiments have been validated using 10-fold cross validation method. The results of the study shows that the most suitable classification technique is NB with an average classification accuracy of 90.13% and an average error rate of 9.87%. The worst classification technique is SLR with an average classification accuracy of 50.16% and an average error rate of 49.84%. The classification techniques has been ranked from the best to the worst based on average classification accuracy and average error rate. The top of the rank is NB and the bottom is SLR. The sequence of ranking from the best to the worst is NB, LDA, LR, SVM, C4.5, MLP, RBF, kNN, RuleB, ID3, CART, 548, SV, RSES, V, and SLR.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Data Mining, Medical Data, Classification
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Faculty and School System > Faculty of Information Technology
Depositing User: Ms Rohaida Rohani
Date Deposited: 10 Jun 2010 08:09
Last Modified: 24 Jul 2013 12:13
URI: http://etd.uum.edu.my/id/eprint/1864

Actions (login required)

View Item View Item