Font Size: a A A

Topic-oriented Automatic Text Summarization

Posted on:2018-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:R YanFull Text:PDF
GTID:2428330512483563Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The overloading of information on the Internet makes it difficult for people to obtain demand information.The appearance of summary makes it possible for people to quickly and accurately gain the main contents of the document,because of the high cost of manual summary,as well as its strong subjectivity,it is difficult for manual summary to solve the information overload problem,which bring the retrieval difficult to users,automatic text summarization technology came into being.The current automatic text summarization system only generates a summary of the query,the purpose is to meet the needs of all users as much as possible,but can not fully meet the needs of each user's personalized information,for this problem,this paper presents a query-oriented multi-dimensional topic automatic summarization method,the LDA topic model is established,first the user's query is identified to multiple query topics,and then according to different topic items,a number of different theme summary is generated to meet the different users of the same query but differentiated search demand.The main work of this paper is as follows:(1)Propose the method of intention fusion and topic identification.The traditional query-oriented multi-document automatic summarization task is only for a query,and a query usually contains only a small amount of keyword information,and it is usually difficult for the summary only based on the keyword to meet the user's real demand for information query.In this paper,a query is decomposed into a number of query-related topics,the user's query is extended according to the topic of the query.On the one hand the query granularity has been refined,and the search space is enriched,on the other hand,it is as much as possible to cover the user's real information needs,and meet the user's search goals fully.(2)In the aspect of query-oriented automatic summary,a multi-dimensional topic summary is proposed for multiple topic items identified by user query.The sentence weighting algorithm is proposed for the sentence weighting method,and the NBI algorithm in the recommendation system is extracted for the abstract sentence.The main task of the traditional query-oriented multi-document automatic summary is to generate a query-related summary,and there are multiple summary generated in this paper,generating a corresponding summary of each topic item identified by the query.In the method of sentence weight calculation,we combine a variety of sentence weight measurement methods,and introduce two similarity measures between document and subject item similarity and sentence and topic coverage,and propose sentence weight decision algorithm.In the extraction of abstract sentences,the NBI algorithm is used to view the various subject items identified by the query as a category,and each sentence in the document set is regarded as another category.The process of selecting the subject summary sentence is regarded as a different topic,a sentence can be recommended to different topics.(3)In the Mac OS environment using the Python language to achieve query-oriented multi-dimensional theme automatic summary system.With the full text of Wikipedia as a training data set,search engine query log and daily retrieval data in Sogou laboratory as test data set,experiments show that,compared with the current query-oriented automatic summary technology,this paper proposed multi-dimensional theme automatic Abstract method is more able to meet the different users of differentiated information retrieval needs.
Keywords/Search Tags:automatic summarization, intention fusion, topic recognition, LDA topic model, decision algorithm, network-based interface algorith
PDF Full Text Request
Related items