Font Size: a A A

Research On Automatic Summarization And The Application In Proposal Management

Posted on:2018-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2348330521950696Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of information science and technology, computer and the Internet become an indispensable part of people's lives, a lot of information begins to appear in the form of electronic documents, and we enter an era of overload information slowly. So we want to use relatively concise text to achieve the compression of information to express the main content of electronic documents. The appearance of automatic summary technology improves the efficiency of acquiring knowledge and information, it also reduces the time for users to query the information they care about. Similarly, during the annual National Committee of the CPPCC, thousands of CPPCC members submit proposals with an average of more than 5,000, so a good summary of the proposal can easily indicate the central theme of the proposal,which can improve the efficiency of CPPCC members' participation in politics and government affairs, and provide a great convenience for the digital management of the proposal.Based on the above background, this thesis has carried on the thorough research to the automatic summarization. This thesis mainly includes the following work:1. In the process of extracting abstracts, the statistical information and semantic information are merged, and the algorithm of sememe similarity calculation based on"Hownet" is designed. At the same time, a semantic similarity calculation algorithm based on context is proposed, which is applied to the semantic similarity of sentences. The algorithm BM25 for measuring the relevance of two sentences is also improved.2. The topic segmentation of the input article is studied. Mutual information is used to measure the degree of correlation between paragraphs. Then abstract sentences are extracted from those sub-themes. This method improves the abstract's coverage of the topic.3. The algorithm TextRank based on graph model is used to calculate the overall weight of the sentences,then combines sentences' feature to implement the extraction of sentences. The input article is divided into several sentences which is used to form the nodes of the graph, and the similarity between sentences is used to form the edge, then the graph model is formed. Then, the candidate summary sentence set is formed by the convergence iterative algorithm and the self characteristics of the sentences. Finally, the candidate summary sentence set is processed smoothly and the sentences are output in the order of original article position.4. An automatic summary system is designed and implemented, and it is applied to the CPPCC proposal management. Then the performance of this system is verified and evaluated by comparison experiment. The experimental results show that the automatic summary algorithm proposed in this thesis has some applicability.Finally, the content of this thesis is summarized and the outlook of the work is put forward.
Keywords/Search Tags:Automatic Summarization, Semantic Similarity, Sentence Weight Calculation, Topic Segmentation, Proposal Management
PDF Full Text Request
Related items