UUM ETD | Universiti Utara Malaysian Electronic Theses and Dissertation
FAQs | Feedback | Search Tips | Sitemap

Multi-document text summarization using text clustering for Arabic Language

Waheeb, Samer Abdulateef (2014) Multi-document text summarization using text clustering for Arabic Language. Masters thesis, Universiti Utara Malaysia.

[img] Text
Restricted to Registered users only

Download (2MB)

Download (520kB) | Preview


The process of multi-document summarization is producing a single summary of a collection of related documents. In this work we focus on generic extractive Arabic multi-document summarizers. We also describe the cluster approach for multi-document summarization. The problem with multi-document text summarization is redundancy of sentences, and thus, redundancy must be eliminated to ensure coherence, and improve readability. Hence, we set out the main objective as to examine multi-document summarization salient information for text Arabic summarization task with noisy and redundancy information. In this research we used Essex Arabic Summaries Corpus (EASC) as data to test and achieve our main objective and of course its subsequent subobjectives. We used the token process to split the original text into words, and then removed all the stop words, and then we extract the root of each word, and then represented the text as bag of words by TFIDF without the noisy information. In the second step we applied the K-means algorithm with cosine similarity in our experimental to select the best cluster based on cluster ordering by distance performance. We applied SVM to order the sentences after selected the best cluster, then we selected the highest weight sentences for the final summary to reduce redundancy information. Finally, the final summary results for the ten categories of related documents are evaluated using Recall and Precision with the best Recall achieved is 0.6 and Precision is 0.6.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Multi-document text summarization, Arabic text summarization, Automatic text summarization, Text clustering.
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Awang Had Salleh Graduate School of Arts & Sciences
Depositing User: Mr. Badrulsaman Hamid
Date Deposited: 01 Mar 2015 02:27
Last Modified: 25 Apr 2016 01:09
URI: http://etd.uum.edu.my/id/eprint/4373

Actions (login required)

View Item View Item