Font Size: a A A

Chinese Query-Focused Multi-document Summarization Based On Cloud Model

Posted on:2012-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J G ChenFull Text:PDF
GTID:1118330368980747Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Wide spread use of internet lead to accumulation of vast amount of information data. With ever increasing popularity of internet, this amount is ever increasing by the moment. For a simple query, a search engine always returns a series web page a user maybe interested in. Since a large proportion of the search results are repetitive or irrelevant information, the user has to spend a lot of time to look for the information they need. To solve this problem, query-focused multi-document summarization was proposed. When given a set of topic-related documents, a query topic consisting of several complex questions, and a user preference profile, one can generate a brief, well-organized fluent summary for the purpose of answering an information need. Query-focused multi-document summarization aims to improve efficiency of obtaining and using information and to increase utilization of network information, therefore to-provide advantages for the user in today's information world.Cloud model, firstly proposed by Academician Li Deyi, is an effective model in transforming qualitative concepts to their quantitative expressions and visa versa. It represents fuzziness, randomness and their relationships of concept of uncertainty. It starts with quantitative representation of qualitative concepts in natural languages in doing research of artificial intelligence with uncertainty. Unfortunately, to the best of our knowledge cloud model is rarely applied in Nature Language Processing (NLP).This paper is concerned with Chinese query-focused multi-document summarization based on Cloud model. First, a large-scale open-benchmark corpus as well as reference summaries written by human is constructed. Then, in order to generate concise and fluent summaries which satisfy the user's needs, cloud model is used in key processes of summarization, such as content unit selecting, sentence compression, as well as sentence ordering. Lastly, summaries are evaluated by ROUGE-CN, which is an improved version of ROUGE and can be used to evaluate summaries in Chinese in an automated fashion.The essence of this thesis can be summarized as the following:First, this paper proposes a summarization unit selecting method based on cloud model. Cloud model is used to consider randomness as well as fuzziness on distribution of summarization unit. In the process of obtaining relevance between summarization unit and query, the scores of relevance between the word and each query word are seen as cloud drops. By obtaining uncertainty of cloud, summarization unit which is more relevant to the query is given higher score. After that, importance in the document set is also obtained to evaluate the sentence's ability to summarize content of the document set. Similarities between a sentence and all sentences in document set are considered as cloud drops. Together these cloud drops become a cloud. We use the cloud to evaluate the sentence's ability to summarize content of the document set, trying to find sentences which can summarize the most content of the document set and avoid under representing the document set. In order to demonstrate the effectiveness of the proposed method, large-scale open benchmark corpuses in English are used in the experiment. We also participated TAC (Text Analysis Conference) 2010 and got satisfactory results.Secondly, this paper introduces the process of constructing a large-scale Chinese query-focused multi-document summarization corpus, as well as the process of setting up the Chinese query-focused multi-document summarization system. The Chinese query-focused multi-document summarization corpus includes 1000 documents,100 document sets and queries, as well as 400 summarization references. By modifying the source code of ROUGE, which is an automated evaluation tools in English, this paper realizes automated evaluation of Chinese summaries. When constructing the Chinese summarization system, we use 50 document sets as training data to train parameters of the module for selecting summarization units.Thirdly, this paper proposes a Chinese sentence compression method based on multi-dimension cloud and dependency relationships to further improve the quality of summaries. A set of heuristic rules based on analysis of dependency relationships are proposed and used to trim sentence and produce compressed sentences that can be used as multiple candidate sentences. The candidate sentences are then scored by multi-dimension cloud model which considers influence of distribution of words among sentences and documents, as well as relevance between the words and the query. Comparing with the single dimension cloud model, the multi-dimension cloud model can retain uncertainties while the clouds are superposing. Candidate sentence which contains the largest amount of information and is shortest in length will replace the original sentence to construct the summary and allow more room for the summary to include more effective information.Lastly, this paper proposes a sentence ordering method that is based on cloud model to make the summary more readily comprehensible. This method takes every source document in any given document set as a template of sentence ordering and combines results of different templates into one single ordering result. The advantage of this method is that it doesn't depend on one single document like the single-template-sentence-ordering method and also avoids the complication of pairwise comparison of the majority-sentence-ordering method. All sentences in document set are clustered into several sub-topics by using adaptive incremental clustering method based on complex networks. Then every document in the document set is seen as a template. All these templates together decide relative position of sub-topics as well as sentences. Sub-topics and sentences in the same topic are sorted in sequence to generate more fluent and more readily comprehensible automated summarization.
Keywords/Search Tags:Query-focused Multi-document Summarization, Cloud Model, Summarization Unit Selecting, Chinese Query-focused Multi-document Summarization Corpus, Chinese Sentence Compression, Sentence Ordering
PDF Full Text Request
Related items