Font Size: a A A

Research On Some Key Technologies Of Sentence Ordering For Information Fusion

Posted on:2011-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:G F PengFull Text:PDF
GTID:1228360305983569Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text ordering is an important technique to improve the readability and coherence of the results of a multi-document automatic summarization system. In recent years, the work on text ordering for information fusion has become a hot issue in research area. The sentence ordering is an analyzing process focusing on the full text information from a high-level, and the work is based on the units of sentences. In the process, relevant factors which will affect the coherence and readability of a text are needed to be analyzed, and then could a reasonable sequence be generated, according to those affecting factors. In this paper, we summarize two relevant factors of text ordering with artificial experience, which are the cohesion of adjacent sentences and the global coherence of chapter-based texts, respectively. And we present several ordering model according to these two factors.In this thesis we propose four reordering models:1. Reordering model based on the cohesion analysis of adjacent sentences; 2. Reordering model based on linear fitting; 3. Reordering model based on association; 4. Integrated reordering model based on machine learning. Each model has been evaluated and analyzed by a number of experiments. The main research and the results are summarized as follows:1. Propose an ordering model based on the analysis of cohesion between adjacent sentences. By the cosine similarity evaluating method, which is widely used in Natural Language Processing (NLP), we are able to quantitatively analyze the correlation of adjacent sentences, and endow the adjacent sentences with a direction coefficient according to the different transferring intensity of information. The model combines the coefficient s of similarity and direction to evaluate the cohesion of adjacent sentences. Since the cohesion between sentences is local effective, it is lack of capability on the analysis of global sequence. To generate an approximate segmentation of global sequence, we treat all the source documents as training data to train the classifier, and divide the summarization sentences into two rough groups. After that, we reorder the sentences of each group by the global cohesion analysis, and give the final sequence of summary. 2. Combined with the role of chapters in the coherence analysis of a whole document, we propose a method of sequence information acquisition based on chapter factors. In this method, each article of the source documents set is treated as a source of the standard summarization sequence, and then by a classification method, the sequence information of the summarization sentences is built up according to each source document. Irrelevant factors of the sequence data are eliminated by a unified pre-processing. Because the data is relatively small and is uncertain within both a single article and a whole source documents set, we propose a method based on linear fitting, and integrate variable of data set to nested equations according to the sequence data matrix. After that we are able to predict the order of sentences in summary by the model based on general information of the whole sequence.3. Propose an ordering model based on association construction of sentences sequence. The method combines with the pre-processed sequence information of data matrix which is obtained from the source documents. First, we discuss the position relationship of each summary sentence in different articles of a source document. Thus according to the feature that the summarization sentences as a whole belong to the same text, we indicate there is linkage among the row data in a sequence data matrix and there is no particular relationship between the linkage of summarization sentences and the difference of sentences sequence. Second, we form the association model of the sequence of two sentences using the adjacent row data in the pre-processed matrix. Finally, we predict each summary sentence position and produce the whole order.4. Propose a machine-learning-based (ML-based) integrated ordering model according to the existing methods. First, we build a multi-dimension space based on the coefficients of Kendall method, and map each possible sequence result to each node in the space. The analysis shows the association exists between geometric distribution of nodes in space and the coefficient of Kendall. Second, we define two linearly independent coefficients and, which based on two order models, and build up an integrated model. And then, we train the integrated model with the results of previous models to determine the value of both coefficients. Finally, we process the test data with the model and generate the final order.
Keywords/Search Tags:Multi-document Summarization, Sentence Ordering, Cohesion of Sentences, Small Sample Size, Machine Learning
PDF Full Text Request
Related items