Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model

Ngu, Penny Ai Huong (2016) Principal component and multiple correspondence analysis for handling mixed variables in the smoothed location model. Masters thesis, Universiti Utara Malaysia.

Preview

Text
s817094_01.pdf
Download (1MB) | Preview

Preview

Text
s817094_02.pdf
Download (1MB) | Preview

Abstract

The issue of classifying objects into groups when the measured variables are mixtures of continuous and binary variables has attracted the attention of
statisticians. Among the discriminant methods in classification, Smoothed Location Model (SLM) is used to handle data that contains both continuous and binary variables simultaneously. However, this model is infeasible if the data is having a large number of binary variables. The presence of huge binary variables will create numerous multinomial cells that will later cause the occurrence of large number of empty cells. Past studies have shown that the occurrence of many empty cells affected the performance of the constructed smoothed location model. In order to overcome the problem of many empty cells due to large number of measured
variables (mainly binary), this study proposes four new SLMs by combining the existing SLM with Principal Component Analysis (PCA) and four types of Multiple Correspondence Analysis (MCA). PCA is used to handle large continuous variables whereas MCA is used to deal with huge binary variables. The performance of the four proposed models, SLM+PCA+Indicator MCA, SLM+PCA+Burt MCA,
SLM+PCA+Joint Correspondence Analysis (JCA), and SLM+PCA+Adjusted MCA are compared based on the misclassification rate. Results of a simulation study show that SLM+PCA+JCA model performs the best in all tested conditions since it successfully extracted the smallest amount of binary components and executed with the shortest computational time. Investigations on a real data set of full breast
cancer also showed that this model produces the lowest misclassification rate. The next lowest misclassification rate is obtained by SLM+PCA+Adjusted MCA followed by SLM+PCA+Burt MCA and SLM+PCA+Indicator MCA models. Although SLM+PCA+Indicator MCA model gives the poorest performance but it is still better than a few existing classification methods. Overall, the developed smoothed location models can be considered as alternative methods for
classification tasks in handling large number of mixed variables, mainly the binary.

Item Type:	Thesis (Masters)
Supervisor :	Hamid, Hashibah and Aziz, Nazrina
Item ID:	6034
Uncontrolled Keywords:	Smoothed Location Model, Principal Component Analysis, Multiple Correspondence Analysis, Large binary variables, Mixed variables
Subjects:	Q Science > QA Mathematics > QA299.6-433 Analysis
Divisions:	Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited:	06 Feb 2017 09:32
Last Modified:	19 Apr 2021 02:43
Department:	Awang Had Salleh Graduate School of Arts and Sciences
Name:	Hamid, Hashibah and Aziz, Nazrina
URI:	https://etd.uum.edu.my/id/eprint/6034

Actions (login required)

: View Item