UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

An improved Levenshtein algorithm for spelling correction word candidate list generation

Abdulkhudhur, Hanan Najm (2016) An improved Levenshtein algorithm for spelling correction word candidate list generation. Masters thesis, Universiti Utara Malaysia.

[thumbnail of s814922_01.pdf] Text
s814922_01.pdf

Download (1MB)
[thumbnail of s814922_02.pdf] Text
s814922_02.pdf

Download (874kB)

Abstract

Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same.

Item Type: Thesis (Masters)
Supervisor : Mohd Yusof, Shahrul Azmi and Yusof, Yuhanis
Item ID: 6564
Uncontrolled Keywords: Levenshtein Algorithm, Processing time, Candidates’ list generation, Errors correction.
Subjects: Q Science > QA Mathematics > QA273-280 Probabilities. Mathematical statistics
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 27 Nov 2017 01:32
Last Modified: 05 Apr 2021 01:34
Department: Awang Had Salleh Graduate School of Arts and Sciences
Name: Mohd Yusof, Shahrul Azmi and Yusof, Yuhanis
URI: https://etd.uum.edu.my/id/eprint/6564

Actions (login required)

View Item
View Item