Font Size: a A A

Research Of Weibo Text Clustering Algorithm Based On K-means

Posted on:2017-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:H J LinFull Text:PDF
GTID:2348330485492586Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the high-speed development of Internet technology, all kinds of social media arises at the historic moment, such as the post bar, QQ, WeChat, Weibo, etc., and quickly into people's social life. Among them, Weibo develops most quickly because of its unique mechanism, and the size of its users has presented the trend of explosive growth.The large amount of Weibo data is the inevitable result of the users' comprehensive development. Because these data directly are associated with daily behavior, user preferences and habits, which contains a lot of information which is potential and valuable. In the face of these large scale user data, how to use them to get information, which can be realized directly being an urgent need to solve the problem.The collecting of Weibo data, which is to explore the crowd social network structure, the inherent law of information dissemination and learn the user's behavior of the virtual network, is the premise condition and support. Because of the large number of users and users' data, how to efficiently collect Weibo data in information explosion's Weibo site, becomes the most important problem of research on Weibo information.To sum up, this article put forward a new topical Web Crawler based on Weibo information, and efficiently obtained the Weibo data, through analysis and research of Weibo information acquisition technology. Then, in order to obtain the user data of habits, preferences, behavior, social contact and so on, clustering analysis is performed in the paper, via using the space vector model to represent the Weibo data and combined with K-means. Main contributions of this article are shown as follows:1) The constructing of Keywords library:this article put forward the crawler strategy of keywords library based on the Weibo information, and then designed an experimental system--KeysLab. This strategy included the following five stages:first, sample selection strategy; second, the thesaurus sample module; third, the pretreatment of sample; fourth, extraction of feature words; and finally, the construction of keywords.2) The improvement of Topic Web Crawler:this article put forward the crawler strategy based on Weibo information, which is improved by using keywords library based on the original Topical Web Crawler. The experimental results showed that the strategy effectively improved the information accuracy and coverage.3) The improvement of K-means clustering algorithm (K-means):this article used the incremental clustering technique to modify k-means algorithm and to solve the problem which is sensitive to initialization state space value clustering on the traditional k-means algorithm.
Keywords/Search Tags:Weibo, Keywords library, Topical Web Crawler, VSM, K-means
PDF Full Text Request
Related items