Font Size: a A A

Design And Implementation Of Algorithms And Applications For Cluster Analysis To Short Text Data

Posted on:2015-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2298330467462194Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the current era when Internet develops with high speed, large amount of data are continued to produce with people’s lives, in these data, short text data become more and more important with the develop of some new medias such as BBS, Twitter etc. How to do cluster Analysis to these short text data efficiently to develop the huge value of these data has always been paid attention in the recent years. However, due to some characteristics of short text data, the effect of the cluster analysis to short text data with traditional method is not ideal. Therefore, this paper tries to design and implement a cluster analysis system to short text data. This system will be valuable in some applications.In the first, this paper described the characteristics of short text data, the study of the research to short text data and some classic cluster algorithm in data mining such as K-means, hierarchical clustering algorithm etc. Secondly, due to the distinctive features of short text, this paper summarized the key problems about traditional method of clustering to short text, combined with some different clustering algorithm and then proposed an improved method of clustering to short text data. This includes some parts such as the segment technology based on trie tree, the feature selection algorithm based on tf-idf, the clustering algorithm based on improved K-means. In the design and implementation of the clustering system, through some researches and combined with actual requirements, this paper designed and implemented a clustering system oriented to short text data such as micro-blog with the method of clustering which is proposed in this paper. This clustering system could help to optimize the result of sort in retrieval system. At last, we proved that it is effective to do clustering analysis in experiments.
Keywords/Search Tags:Short Text Data, Clustering Algorithm, Natural Languageprocessing, K-means
PDF Full Text Request
Related items