Font Size: a A A

Research On The Algorithms For Automatic Summarization Of Single Text Documents In Uyghur

Posted on:2015-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:H P R T W L MaiFull Text:PDF
GTID:2298330431991894Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic abstract technology by automatically choose the document on behalf of thesentence can greatly improve the efficiency of information use. In recent years, based on Englishand Chinese automatic abstract technology has made rapid progress and got wide attention.However,the study on automatic abstract of minority languages is insufficient, such as the Uyghur.The development of information technology represented by the Internet enable people to obtaininformation more convenient than ever, but also poses challenges for how to effectively useinformation. This article first from the Uyghur language website to download the588document(including education, computer, military, real estate, history, geography class) document corpuswas established. In pretreatment fully considering the characteristics of Uyghur text informationand the rules of grammar, stop words filtering is analyzed, stemming, old Uyghur text conversionUyghur new text pretreatment process on the quality of the abstract, etc. In terms of automaticabstract extraction, this article uses the automatic summarization method based on TF-IDFkeywords, automatic summarization method based on TextRank keywords, automaticsummarization method based on LexRank algorithm, automatic summarization method based onLexRank and TextRank weight combined with four different methods, such as automatic abstractUyghur single document for study. First constructs a single document automatic abstract systembased on keywords. Is given priority to with keywords, we use based on TF-IDF and twokeyword extraction algorithm based on TextRank to extract the keywords, further implement theincluding the sentences to form abstract these keywords, compare the quality of abstract. In theexperiment using ROUGE average as evaluation document performance of the abstract. In thepremise of fully considering the Uyghur language information, the method based on TextRank toextract the keywords has stronger representation, thus more conducive to improve theperformance of Uyghur language automatic abstract system. The second time we LexRankalgorithm is applied to the Uyghur language based on English, realizes the algorithm of Uyghurlanguage based on LexRank single document automatic abstract system. onsider LexRank andTextRank combined with the necessity of at the same time, compared the LexRank basedalgorithm and LexRank and TextRank weight combination algorithm of automatic abstract effect.Experiments show that LexRank algorithm considers only information between sentence andsentence, regardless of the word. Therefore, the TextRank weight also considering the wordinformation. Experimental results show that the algorithm based on LexRank and TextRankweight combination effect is significantly better than LexRank based algorithms and theexperimental results based on keywords. It is proved that based on LexRank and TextRank weightcombination method is more suitable for Uyghur single document automatic abstract application.
Keywords/Search Tags:Uyghur, Automatic summarization, TF-IDF algorithm, TextRank, ROUGE
PDF Full Text Request
Related items