A Study Of Chinese Text Summarization Based On Adaptive Clustering Algorithm

Posted on:2006-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:P Hu

Full Text:PDF

GTID:2168360152995233

Subject:Computer software and theory

Abstract/Summary:

Automatic summarization is an important research issue in natural language processing. Now, more and more researchers over the world are paying attention to this area. For one thing, automatic summarization technology can compensate the pitfalls of traditional information retrieval technology in a certain degree when dealing with information overload problem; for another, automatic summarization technology can release users' browsing pressure.There are still a lot of problems in the research of Chinese document summarization. For instance, a lot of researchers are adopting the traditional summarization method, which extracts relevant sentences from the entire text according to each sentence's score. However, these methods do not take the document's thematic structure into account, so the generated summaries using these methods will cover only those main themes while neglecting the others, and sometimes have a high level of redundancy. In addition, in the course of developing a practical automatic summarization system, dimensionality reduction of various linguistic units will be a fundamental and important step.In this paper, we propose a Chinese summarization method based on adaptive clustering algorithm. Four key technologies are adopted in this method:The key technology one: Feature vector representations of various linguistic units based on unsupervised feature extractionThe key technology two: Discovery of latent themes based on adaptive clustering algorithmThe key technology three: Selection of representative sentences from different themes using theme-sentence similarity calculationThe key technology four: Quantitative evaluation of summary's redundancy based on representation entropyWe choose thirty different genres of documents as experimental samples from the Modern Chinese Corpus of State Language Commission. By using the proposed method and traditional baseline method, we get the relevant results. And the experimental results indicate that the proposed method is more effective and efficient when dealing with various genres of documents, for it can balance the generated summary's thematic coverage and redundancy in a certain degree.

Keywords/Search Tags:

automatic summarization, thematic discovery, unsupervised feature extraction, clustering, representation entropy

Related items

1	Research Of Automatic Summarization Oriented To News Text
2	Research On Classification And Automatic Summarization Of Web Information
3	Feature Representation And Image Compression Based On Unsupervised Deep Model
4	Unsupervised Clustering Algorithm Based On Dimension Reduction
5	Study Of Analog Wafer Map Feature Extraction And Clustering Based On Unsupervised Methods
6	Research On Image Feature Extraction Based On Sparse Representation
7	Research And Application Of Web Information Extraction And Webpage Summarization
8	A Study Of Chinese Multi-document Summarization Based On Adaptive Clustering Algorithm
9	Study On Methods And Their Applications Of Text Automatic Summarization And Information Extraction
10	Research On Keyphrase Extraction Based Automatic Summarization Method For Chinese Webpage