Research On The Algorithms For Automatic Summarization Of Single Text Documents In Uyghur

Posted on:2015-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:H P R T W L Mai

Full Text:PDF

GTID:2298330431991894

Subject:Computer application technology

Abstract/Summary:

Automatic abstract technology by automatically choose the document on behalf of thesentence can greatly improve the efficiency of information use. In recent years, based on Englishand Chinese automatic abstract technology has made rapid progress and got wide attention.However,the study on automatic abstract of minority languages is insufficient, such as the Uyghur.The development of information technology represented by the Internet enable people to obtaininformation more convenient than ever, but also poses challenges for how to effectively useinformation. This article first from the Uyghur language website to download the588document(including education, computer, military, real estate, history, geography class) document corpuswas established. In pretreatment fully considering the characteristics of Uyghur text informationand the rules of grammar, stop words filtering is analyzed, stemming, old Uyghur text conversionUyghur new text pretreatment process on the quality of the abstract, etc. In terms of automaticabstract extraction, this article uses the automatic summarization method based on TF-IDFkeywords, automatic summarization method based on TextRank keywords, automaticsummarization method based on LexRank algorithm, automatic summarization method based onLexRank and TextRank weight combined with four different methods, such as automatic abstractUyghur single document for study. First constructs a single document automatic abstract systembased on keywords. Is given priority to with keywords, we use based on TF-IDF and twokeyword extraction algorithm based on TextRank to extract the keywords, further implement theincluding the sentences to form abstract these keywords, compare the quality of abstract. In theexperiment using ROUGE average as evaluation document performance of the abstract. In thepremise of fully considering the Uyghur language information, the method based on TextRank toextract the keywords has stronger representation, thus more conducive to improve theperformance of Uyghur language automatic abstract system. The second time we LexRankalgorithm is applied to the Uyghur language based on English, realizes the algorithm of Uyghurlanguage based on LexRank single document automatic abstract system. onsider LexRank andTextRank combined with the necessity of at the same time, compared the LexRank basedalgorithm and LexRank and TextRank weight combination algorithm of automatic abstract effect.Experiments show that LexRank algorithm considers only information between sentence andsentence, regardless of the word. Therefore, the TextRank weight also considering the wordinformation. Experimental results show that the algorithm based on LexRank and TextRankweight combination effect is significantly better than LexRank based algorithms and theexperimental results based on keywords. It is proved that based on LexRank and TextRank weightcombination method is more suitable for Uyghur single document automatic abstract application.

Keywords/Search Tags:

Uyghur, Automatic summarization, TF-IDF algorithm, TextRank, ROUGE

Related items

1	Design And Implementation Of Automatic Summarization System Based On Textrank Algorithm
2	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
3	Research On Automatic Summarization Of Microblog Events
4	Research On Automatic Summarization Of Chinese Literature Based On TextRank Algorithm
5	Single Document Automatic Summarization Based On TextRank Algorithm
6	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
7	Automatic News Summarization System Based On Event Popularit
8	Research On Automatic Summarization Of News Events
9	Research On Text Automatic Summarization Method
10	The Public Opinion Analysis Of Uyghur Text Automatic Summarization