Font Size: a A A

Research On Chinese Automatic Summarization And Its Evaluation Method

Posted on:2008-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:L Q HuangFull Text:PDF
GTID:2178360215990246Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, information in the Internet net increases sharply. It is already a very urgent problem that how to search information needed and obtain the main content. As the miniature of article content, abstract becomes an effective way of digging useful information because of its simplicity, accuracy and clarity. However, traditional manual abstract is an inefficient method; it is hard to satisfy the rapid retrieval needs in information. Automatic summarization which utilized computer and artificial intelligent technology, is easy to perform information retrieval and reuse, and has become the requirement of era.In this paper, Chinese automatic summarization approach and the correponding evaluation method have been discussed, and the main work of this dissertation includes the following three aspects:1. A new sentence similarity computing method based on framework dependence is designed.In the field of natural language processing, sentence similarity calculation is a very extensive application technology, and plays an important role. The dissertation presents a new sentence similarity computing method based on framework dependence while the popular methods are analyzed. In this method, semantic similarity is compared on the basis of syntactic analysis, and it considers the influence of negative adverbs, which can reflect semantic similarity more accurately between sentences, so it is a suitable method for summarization.2. A Chinese automatic summarization method based on multi-feature is presented and a system has been implemented.With the 50 years development, there are many different automatic summarization ways, but their results are not satisfactory. Therefore, this paper presents a new automatic summarization method based on statistic, semantic and structural features while the advantages and disadvantages are analyzed for the popular methods. Twelve features are used to form the feature vector for each sentence, and the summarizer has been obtained by machine learning algorithms, so the problem of automatic summarization is changed into classification task. Meanwhile, a series of post processing is given to overcome the shortcoming of redundancy and incoherence, and it improves the quality of summary largely. 3. An automatic summarization evaluation method based on text similarity is presented.System evaluation is a very important part in automatic summarization. On the one hand, evaluation can validate the effectiveness, availability and intelligibility of system. On the other hand, the results of evaluation can feed back to every proceeding stages and the process can improve system performance. The dissertation summarizes the defects of internal evaluation, and presents a new evaluation method based on text similarity. In the method, the text similarity between automatic summarization and ideal summarization is compared to obtain performance value of system.In a word, these three aspects of research form a complete system. The automatic summarization method based on multi-feature is the core. The evaluation method based on text similarity is used to validate the effectiveness of system, and sentence similarity algorithm is an important part of automatic summarization and its evaluation method.
Keywords/Search Tags:Automatic Summarization, Sentence Similarity, Evaluation, Vector Space Model, Machine Learning
PDF Full Text Request
Related items