UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters

Norin Rahayu, Shamsuddin (2022) Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters. Doctoral thesis, Universiti Utara Malaysia.

[thumbnail of permission to deposit-900985.pdf] Text
permission to deposit-900985.pdf
Restricted to Repository staff only until 6 March 2023.

Download (274kB) | Request a copy
[thumbnail of s900985_01.pdf] Text
s900985_01.pdf
Restricted to Repository staff only until 6 March 2023.

Download (26MB) | Request a copy
[thumbnail of s900985_02.pdf] Text
s900985_02.pdf
Restricted to Repository staff only until 6 March 2023.

Download (18MB) | Request a copy

Abstract

Consensus clustering has an ability to overcome instability in estimating the number of clusters, k faced by traditional clustering approach. Consensus clustering offers better estimate by consolidating clustering results into an optimal value. However, the consensus clustering approach faced with three weakness which are lack of clear rules for construction of multiple base partitions, B; lack of specific procedure in combining the outcome of clustering from B into a single consolidated value; and suffers from excessive computational time and complexity in identifying k. Motivated by those weaknesses, this study designs a cross-validation consensus clustering using reference point at every base partition to obtain optimal number of clusters, ˘k*y to produce more robust and stable results. The proposed design creates base partitions using a 10-fold cross-validation approach. In each base partition, the reference point was imposed by extracting 30% of the objects from a dataset to identify ˘k*y. The ˘k*y is used to cluster the objects and identify its clusters. The designed was tested on both simulated and real datasets using stability index, heatmap visualisation and clustering validations. The findings showed that the proposed design performs better in term of computational times in clustering the objects in less than one minute once ˘k*y is obtained. The results also revealed that clustering throughout base partitions in both simulated and real datasets are robust and stable. The proposed design works well on non-overlapping clusters or unequal size of objects cases with least completion time for clustering process. The design also competitive to other clustering approaches in high overlapping clusters and unclear structure of clusters problems.

Item Type: Thesis (Doctoral)
Supervisor : Mahat, Nor Idayu and Che Dom, Nazri
Item ID: 9744
Uncontrolled Keywords: Consensus clustering, Cross-validation, Optimal number of clusters, References point.
Subjects: Q Science > QA Mathematics
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 14 Aug 2022 01:25
Last Modified: 14 Aug 2022 01:25
Department: Awang Had Salleh Graduate School of Art & Sciences
Name: Mahat, Nor Idayu and Che Dom, Nazri
URI: https://etd.uum.edu.my/id/eprint/9744

Actions (login required)

View Item
View Item