Font Size: a A A

The Method Of Fine-Grained Topic Information Extraction And Text Clustering Based On Chinese Phrase

Posted on:2016-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:W J LinFull Text:PDF
GTID:2308330479993934Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this age, people frequently communicate by using text message on smart phone. Hence, a large amount of short text contains the various view points and attitudes of people on different hot topic in society. And how to mine out the valuable text information among such a large amount of text data while filtering the spam information becomes a hot topic in the field of data mining.Text mining technology plays an important role in different applications such as public opinion analysis, hot topic tracking. And text topic extraction technology is widely applied to refine the text information and reflect the main meaning of text, which highly reduces the manual reviewing work. The Chinese text is made up of words, and since the single word is the most fine-grained semantic unit, words are not able to express the meaning of text fragment. However, phrase is a semantic unit which contains more complete text information while compared to word, which is preferred to be extracted to express the text information topic. This paper aims to extract phrases from text to express the refined semantic information, and apply the phrase features to the method of text clustering.This paper takes the phrase as the basic semantic information unit and applies the research on it. We have done the research:(1)propose a method of double-linguistic-filter(lexical category filter and phrase-extending filter) to weed out the redundant information and extract topic phrase from text. The phrase results are close to the refined semantic expression of text.(2) implement text clustering by applying the phrase feature. We take the clustering work on the topic phrase from text fragments, and the cluster result validly express the main topic information of text fragments. Therefore, we implement ROCK clustering method while combining the phrase frequency information on the text data. The experimental result indicates our methods of phrase extraction and clustering are workable for handling the text data.(3)design and realize a hot spot mining system for client complaint service, which involves our method of phrase extraction and text clustering. The system has great expansibility and it shows its practical value. At last, the method we proposed will indicates other new method on text mining, and shows it guidance effort.
Keywords/Search Tags:phrase extraction, text clustering, rule mining, pattern matching, phrase feature
PDF Full Text Request
Related items