Font Size: a A A

Research On Automatic Text Summarization By The Integration Of Contextual Information

Posted on:2014-11-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:P HuFull Text:PDF
GTID:1228330398954873Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Summary is one of the major ways to understand the key points of a document and help improve the efficiency of decision-making and reading. Currently, the explosive growth of information has far more than the ability of ordinary people to afford, understand, and leverage. A large number of irrelevant and redundant information has seriously interfered with information access and digestion. In this case, the importance of automatic summarization is self-evident, which aims to extract valuable information from document. It has become the hot topic of natural language processing and information retrieval.How to evaluate the importance of document’s content is not only the key point of summarization research, but also the difficult issue. Most traditional methods choose sentence as the basic unit for evaluation and perform sentence scoring directly according to the local information of document(s). However, they generally ignore that summarization is context-dependent. That means the process of generating a summary is not solely determined by the document’s information, and a variety of contextual information may also affect the evaluation result or even determine the quality of the generated summary indirectly.In view of this, to effectively utilize the contextual information, we carry out the research from the following aspects, and propose a series of summarization methods based on the integration of different contextual information. Experimental results on multiple data sets verify the effectiveness of our proposed methods.1. Automatic Summarization by the Integration of Content Context.For the integration of content context, we focus on query-oriented multi-document summarization. Relative to the document set to be summarized, query can be treated as another explicit content context, which is used to choose contents so as to fit the user’s needs as much as possible. In the thesis, two different kinds of approaches have been proposed. The first approach is based on the factors related to the content context. It first selects the query-related sentences via the co-training process under multiple views, and then scores these sentences via the Markov random walk model. The approach takes full use of the content information of the query and sentences as well as the relationships between them, so that the generated summary may keep a good balance among query relevance, content salience, and information diversity. The second approach is based on the contextual collaboration. A sentence scoring algorithm based on Co-HITS-Ranking is proposed, which incorporates the influence of different grain of contexts on the sentence importance evaluation. We performed experiments on the DUC and TAC datasets, and the experimental results show that both of our proposed approaches can generate good query-oriented summaries via integrating the query contextual information into the summarization process.2. Automatic Summarization by the Integration of Usage ContextNow, with the rapid rise of social networking websites, many users take the initiative to participate in the online content feedback. They share and exchange their reading experiences by writing reviews, adding social tags, etc. Whether the contextual information provided by them can help reveal the important content of the target document and aid find their interest preference is worthy of further study. In the thesis, we focus on both the general and personalized summarization based on social context. To compensate the shortcomings of existing studies, which rarely consider the influence of the feedback information in the form of social tags, we regard the social tagging information as auxiliary information source, and employ the tripartite clustering algorithm to cluster documents, users, and tags simultaneously to discover the social context for the target document. And then adopt the context-sensitive sentence scoring and fusion algorithm to extract a small number of important sentences in accord with the interests of user groups or specific user’s preference. Experimental results on the Delicious dataset demonstrate the effectiveness of the proposed approaches.3. Automatic Summarization by the Integration of Usage Context and Structure Context.As the data source with usage context and structure context, academic literature promotes the dissemination of knowledge. However, the quality of the vast amount of literature varies significantly, which greatly increases the difficulty of researchers to obtain valid information. In this context, how to quickly identify the impact aspects of the target literature becomes the major issue of common concern, which is also the goal of the impact summarization task. Existing methods tend to be limited to consider the external citation sentence information, but rarely pay attention to the citation context these citations sentences belong to. In view of this, we propose an impact summarization approach based on the hybrid citation context. The approach leverages multiple different kinds of relationships among citation context (i.e. the citation relationship between papers, the co-authorship relationship between authors, and the authorship relationship between papers and authors) to jointly infer the impact of hybrid citation context in a regularization framework, which is further integrated in a sentence language smoothing model to measure citation sentence relationships more effectively. By this way, the influential aspects of the target literature can be identified. Experiments on the open academic literature data sets verify the effectiveness of the proposed approach.
Keywords/Search Tags:automatic text summarization, contextual information integration, socialcontext, citation context, context collaborative sentence scoring
PDF Full Text Request
Related items