Font Size: a A A

Research On Technology Of Automatic Text Summarization Based On Multiple Word Co-occurrence And Mutual Information

Posted on:2015-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2298330422989052Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays automatic summarization is playing an increasingly important role inour life as a basic tool abstracting key information from texts. Automaticsummarization is a process to generate automatically a short paragraph which containsthe main points of the text or the key information which people are interested in. It canhelp to improve the efficiency of information transmission and reduce the time ofinformation retrieval. Therefore, how to quickly and accurately get the summarizationfrom the text has become a research hot point which people are concerned about. Thisthesis studies the technology of automatic text summarization based on multiple wordco-occurrence and mutual information, which improves the accuracy of text automaticsummarization by importing word co-occurrence technology into topic word extraction,and utilizing mutual information to measure the text features, compartmentalize thetopics and optimize the topic sentence extraction. The main research contents are asfollows.Firstly, in order to improve the accuracy of the topic word extraction, we take ananalysis on the word co-occurrence, and find out the correlation between word co-occurrence and text topic expression by computing the degree of word co-occurrence.For optimizing the method of the topic words extraction, we also utilize the word co-occurrence computation to the topic word extraction. The experiment results show thevalidity of topic words extraction and the average accuracy improved by6.5%.Secondly, in order to improve the accuracy of topic compartmentalization andmake the summarization reflect the main idea of the text comprehensively, we importthe mutual information to the text correlation degree computation. We make use ofmutual information to measure the association between word and word, sentence andsentence, paragraph and paragraph in a text, and divide the whole text into smallerunits by considering the association among paragraphs. Then, we implement the text topic compartmentalization. The experiment results show the validity of the topiccompartmentalization and the average accuracy improved by10%.Thirdly, in order to improve the accuracy of topic sentences extraction, we needto take seven key elements into consideration, those are the importance of thesentences, the words importance in a sentence, the importance of the topic words, thelocation of the sentence in the text, the clue words number in one sentence, the lengthof the sentence and the association among sentences. We can get the weights of eachsentence according to seven key elements above, and choose the number of sentencesform different topics by measuring the importance of each topic. Then output thesentences with the higher weights to generate the topic sentences. The experimentresults show the validity of topic sentences extraction and the average accuracyimproved by3.5%Finally, we develop a system to extract topic words and summarization accordingto our main points above. It has achieved good results on automatic summarization.
Keywords/Search Tags:Automatic summarization, Mutual information, Topic word, Degree of association, Topic compartmentalization, Topic sentence
PDF Full Text Request
Related items