Font Size: a A A

Research On Clustering Approach For Text Messages

Posted on:2012-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZhangFull Text:PDF
GTID:2218330368489913Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of communication industry in recent years, text messaging has become a regular tool for mobile phone users to communicate, make comments or give suggestions. In order to improve the services, communication companies often receive customer feedback (opinions or advice) about their mobile service in the form of text message. The classification of information is an important component of information processing. How to effectively organize, manage, classify and treat the customer information is a major problem placed in a number of communications companies, also a new challenge to the field of information science and technology. In this paper about the problem of the clustering to text messages, the two feature selection methods are given first. Next in the aspect of the clustering algorithm, use the information extraction and K-means algorithm to cluster the text, and then achieve the best clustering to text message by method of selecting many initial points represent the type of information in various clusters. Main contents as follows.(1) Study on Data Distribution of High Dimensional SpaceThis paper uses supervised K-NN classification algorithm and the traditional unsupervised and unguided K-means methods to study the selected data in high-dimensional space distribution. Method One:by informati on-gain-based supervised K-NN classification methods, respectively using closed and semi-closed test to examine whether the data of high dimensional space is consistent; Method Two:by information-gain-based K-means clustering method, inspecting the Aggregation of data.(2) Study on Feature Selection Method for text messageText message itself is short, and through fewer words it can express the content that text expresses. In this paper, as to the properties of message, we do research respectively from two aspects:extracting the high document frequency, extracting high document frequency from each category as the feature.(3) Text Message Data Processing Algorithm Based on Rules and K-means ClusteringIn text message data processing algorithm based on rules and K-means clustering, we classify the text whose words have been divided and the text that contains the property of "integral" as integral class. This can avoid lots of other four kinds of text wrongly being assigned to this class. Either from individual class or the accuracy of the overall point of view, this approach is valid for the clustering effect.(4) Research on Initial Point SelectionFor unsupervised K-means clustering algorithm, more points can be used to represent information of each class by selecting many initial points. Thus, in the data of the uneven distribution, we can avoid that the set of single initial point represents one class. Through the selection of different initial points, we can see that the more initial points, the more able to represent the various categories of information so that the better the clustering effect.
Keywords/Search Tags:text message, the initial point, feature selection, K-means, K-NN
PDF Full Text Request
Related items