UUM Electronic Theses and Dissertation
UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Hybrid model of post-processing techniques for Arabic optical character recognition

Habeeb, Imad Qasim (2016) Hybrid model of post-processing techniques for Arabic optical character recognition. PhD. thesis, Universiti Utara Malaysia.

[thumbnail of s94758_01.pdf]

Download (3MB) | Preview
[thumbnail of s94758_02.pdf]

Download (3MB) | Preview


Optical character recognition (OCR) is used to extract text contained in an image. One of the stages in OCR is the post-processing and it corrects the errors of OCR output text. The OCR multiple outputs approach consists of three processes: differentiation, alignment, and voting. Existing differentiation techniques suffer from
the loss of important features as it uses
N-versions of input images. On the other hand, alignment techniques in the literatures are based on approximation while the voting process is not context-aware. These drawbacks lead to a high error rate in OCR. This research proposed three improved techniques of differentiation, alignment, and voting to overcome the identified drawbacks. These techniques were later combined into a hybrid model that can recognize the optical characters in the
Arabic language. Each of the proposed technique was separately evaluated against three other relevant existing techniques. The performance measurements used in this study were Word Error Rate (WER), Character Error Rate (CER), and Non-word
Error Rate (NWER). Experimental results showed a relative decrease in error rate on all measurements for the evaluated techniques. Similarly, the hybrid model also obtained lower WER, CER, and NWER by 30.35%, 52.42%, and 47.86% respectively when compared to the three relevant existing models. This study contributes to the OCR domain as the proposed hybrid model of post-processing techniques could facilitate the automatic recognition of Arabic text. Hence, it will lead to a better information retrieval.

Item Type: Thesis (PhD.)
Supervisor : Mohd Yusof, Shahrul Azmi and Yusof, Yuhanis
Item ID: 6030
Uncontrolled Keywords: Arabic optical character recognition, Post-processing techniques, Multiple outputs of OCR.
Subjects: T Technology > T Technology (General) > T58.5-58.64 Information technology
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Date Deposited: 05 Feb 2017 15:27
Last Modified: 05 Apr 2021 02:28
Department: Awang Had Salleh Graduate School of Arts and Sciences
Name: Mohd Yusof, Shahrul Azmi and Yusof, Yuhanis
URI: https://etd.uum.edu.my/id/eprint/6030

Actions (login required)

View Item
View Item