Research On Automatic Text Summarization By The Integration Of Contextual Information

Posted on:2014-11-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P Hu

Full Text:PDF

GTID:1228330398954873

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Summary is one of the major ways to understand the key points of a document and help improve the efficiency of decision-making and reading. Currently, the explosive growth of information has far more than the ability of ordinary people to afford, understand, and leverage. A large number of irrelevant and redundant information has seriously interfered with information access and digestion. In this case, the importance of automatic summarization is self-evident, which aims to extract valuable information from document. It has become the hot topic of natural language processing and information retrieval.How to evaluate the importance of documentâ€™s content is not only the key point of summarization research, but also the difficult issue. Most traditional methods choose sentence as the basic unit for evaluation and perform sentence scoring directly according to the local information of document(s). However, they generally ignore that summarization is context-dependent. That means the process of generating a summary is not solely determined by the documentâ€™s information, and a variety of contextual information may also affect the evaluation result or even determine the quality of the generated summary indirectly.In view of this, to effectively utilize the contextual information, we carry out the research from the following aspects, and propose a series of summarization methods based on the integration of different contextual information. Experimental results on multiple data sets verify the effectiveness of our proposed methods.1. Automatic Summarization by the Integration of Content Context.For the integration of content context, we focus on query-oriented multi-document summarization. Relative to the document set to be summarized, query can be treated as another explicit content context, which is used to choose contents so as to fit the userâ€™s needs as much as possible. In the thesis, two different kinds of approaches have been proposed. The first approach is based on the factors related to the content context. It first selects the query-related sentences via the co-training process under multiple views, and then scores these sentences via the Markov random walk model. The approach takes full use of the content information of the query and sentences as well as the relationships between them, so that the generated summary may keep a good balance among query relevance, content salience, and information diversity. The second approach is based on the contextual collaboration. A sentence scoring algorithm based on Co-HITS-Ranking is proposed, which incorporates the influence of different grain of contexts on the sentence importance evaluation. We performed experiments on the DUC and TAC datasets, and the experimental results show that both of our proposed approaches can generate good query-oriented summaries via integrating the query contextual information into the summarization process.2. Automatic Summarization by the Integration of Usage ContextNow, with the rapid rise of social networking websites, many users take the initiative to participate in the online content feedback. They share and exchange their reading experiences by writing reviews, adding social tags, etc. Whether the contextual information provided by them can help reveal the important content of the target document and aid find their interest preference is worthy of further study. In the thesis, we focus on both the general and personalized summarization based on social context. To compensate the shortcomings of existing studies, which rarely consider the influence of the feedback information in the form of social tags, we regard the social tagging information as auxiliary information source, and employ the tripartite clustering algorithm to cluster documents, users, and tags simultaneously to discover the social context for the target document. And then adopt the context-sensitive sentence scoring and fusion algorithm to extract a small number of important sentences in accord with the interests of user groups or specific userâ€™s preference. Experimental results on the Delicious dataset demonstrate the effectiveness of the proposed approaches.3. Automatic Summarization by the Integration of Usage Context and Structure Context.As the data source with usage context and structure context, academic literature promotes the dissemination of knowledge. However, the quality of the vast amount of literature varies significantly, which greatly increases the difficulty of researchers to obtain valid information. In this context, how to quickly identify the impact aspects of the target literature becomes the major issue of common concern, which is also the goal of the impact summarization task. Existing methods tend to be limited to consider the external citation sentence information, but rarely pay attention to the citation context these citations sentences belong to. In view of this, we propose an impact summarization approach based on the hybrid citation context. The approach leverages multiple different kinds of relationships among citation context (i.e. the citation relationship between papers, the co-authorship relationship between authors, and the authorship relationship between papers and authors) to jointly infer the impact of hybrid citation context in a regularization framework, which is further integrated in a sentence language smoothing model to measure citation sentence relationships more effectively. By this way, the influential aspects of the target literature can be identified. Experiments on the open academic literature data sets verify the effectiveness of the proposed approach.

Keywords/Search Tags:

automatic text summarization, contextual information integration, socialcontext, citation context, context collaborative sentence scoring

PDF Full Text Request

Related items

1	Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm
2	Citation-context Based Academic Literature Summarization Method
3	Research On Citation Sentiment Analysis Based On Semantics In Citation Context And Its Application
4	Research On Deep Neural Networks Based Automatic Text Summarization
5	Multi-role Cooperative Web System Anomalies Contextual Analysis And Reproduce
6	Collaborative Filtering Recommendation Algorithm Incorporating Context Information
7	Research On Technologies Of Image Object Detection Based On Contextual Information
8	Research On Citation Context Recognition Based On Pre-trained Language Model
9	Optimization Of Abstract Sentence Combination Based On Pre-training Model
10	Research On The Extraction Method Of Chinese-Vietnamese Pseudo-parallel Sentence Pairs Based On Image-text Information Enhancement