UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model

Ngu, Penny Ai Huong (2016) Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model. Masters thesis, Universiti Utara Malaysia.

[img] Text
Restricted to Registered users only

Download (1MB)

Download (1MB) | Preview


The issue of classifying objects into groups when the measured variables are mixtures of continuous and binary variables has attracted the attention of statisticians. Among the discriminant methods in classification, Smoothed Location Model (SLM) is used to handle data that contains both continuous and binary variables simultaneously. However, this model is infeasible if the data is having a large number of binary variables. The presence of huge binary variables will create numerous multinomial cells that will later cause the occurrence of large number of empty cells. Past studies have shown that the occurrence of many empty cells affected the performance of the constructed smoothed location model. In order to overcome the problem of many empty cells due to large number of measured variables (mainly binary), this study proposes four new SLMs by combining the existing SLM with Principal Component Analysis (PCA) and four types of Multiple Correspondence Analysis (MCA). PCA is used to handle large continuous variables whereas MCA is used to deal with huge binary variables. The performance of the four proposed models, SLM+PCA+Indicator MCA, SLM+PCA+Burt MCA, SLM+PCA+Joint Correspondence Analysis (JCA), and SLM+PCA+Adjusted MCA are compared based on the misclassification rate. Results of a simulation study show that SLM+PCA+JCA model performs the best in all tested conditions since it successfully extracted the smallest amount of binary components and executed with the shortest computational time. Investigations on a real data set of full breast cancer also showed that this model produces the lowest misclassification rate. The next lowest misclassification rate is obtained by SLM+PCA+Adjusted MCA followed by SLM+PCA+Burt MCA and SLM+PCA+Indicator MCA models. Although SLM+PCA+Indicator MCA model gives the poorest performance but it is still better than a few existing classification methods. Overall, the developed smoothed location models can be considered as alternative methods for classification tasks in handling large number of mixed variables, mainly the binary.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Smoothed Location Model, Principal Component Analysis, Multiple Correspondence Analysis, Large binary variables, Mixed variables
Subjects: Q Science > QA Mathematics > QA299.6-433 Analysis
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Depositing User: Mr. Badrulsaman Hamid
Date Deposited: 06 Feb 2017 09:32
Last Modified: 06 Feb 2017 09:32
URI: http://etd.uum.edu.my/id/eprint/6034

Actions (login required)

View Item View Item