Umana, Muhammad Khalifa (2025) An improved synthetic data generation for rice yield prediction model. Masters thesis, Universiti Utara Malaysia.
Depositpermission-Embargo 3years_s830636.pdf
Restricted to Repository staff only until 20 November 2028.
Download (390kB) | Request a copy
s830636_01.pdf
Restricted to Repository staff only until 20 November 2028.
Download (2MB) | Request a copy
s830636_02.pdf
Download (575kB)
Abstract
Rice plays a crucial role in ensuring food security and economic stability in Malaysia. However, rice yield prediction face challenges due to limited and inadequate agricultural data. Even though synthetic data generation can be a solution such as variant of generative adversarial network (GAN), it still has a problem if it is trained by limited data. To enrich the data and strengthen GAN ability in generating data, complementing the generative model training by statistical-based data generation, such as SMOTE, can be integrated. Therefore, this study aims to develop an improved rice yield prediction model by proposing the SMOTE-CTABGAN Enhanced Generation (SCEG), a novel integration of SMOTE-ENC and CTAB-GAN, to address data limitation. This study includes designing a new synthetic data generation method, implementing a post-generation processing procedure to ensure variable remain practical, stays between agriculturally relevant range of value, and evaluating the improved prediction models performance using the generated synthetic data. Using rice yield data from Department of Statistic Malaysia (2010-2021) and climate data from Malaysia Meteorological Department (2010-2021), this study synthesized and enhance the dataset with proposed method. The findings demonstrate SCEG's ability to generate a high-quality synthetic dataset that is closely related to real-world datasets. For example, SCEG able to reduce MSE by up to 98.44% in Decision Tree regressors. Furthermore, post-processing methods such as Variable-base Value Limitation and Interquartile Range improve SCEG performance, with Limitation and IQR delivering the best improvements: a 60.20% reduction in MAE, 86.65% in MSE, and 54.86% reduction in RMSE, as well as an 8.15% increase in R-squared. The research proves that SCEG can create a strong synthetic dataset and enhance real-world data, with a more precise and dependable model for predicting rice output. The research's relevance rests in its capacity to transform agricultural projections, resulting in more informed judgments that can significantly enhance the nation's food security endeavors.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisor : | Mohamad Mohsin, Mohamad Farhan and Hassan, Mohamad Ghozali |
| Item ID: | 11995 |
| Uncontrolled Keywords: | Synthetic Data, Agricultural Preediction Model, Generative Adversarial Network |
| Subjects: | H Social Sciences > HB Economic Theory Q Science > QA Mathematics > QA299.6-433 Analysis |
| Divisions: | Awang Had Salleh Graduate School of Arts & Sciences |
| Date Deposited: | 09 Feb 2026 07:30 |
| Last Modified: | 09 Feb 2026 07:30 |
| Department: | Awang Had Salleh Graduates School of Arts & Sciences |
| Name: | Mohamad Mohsin, Mohamad Farhan and Hassan, Mohamad Ghozali |
| URI: | https://etd.uum.edu.my/id/eprint/11995 |

