Font Size: a A A

Research Of K-means Clustering Algorithm Based On News Comments

Posted on:2011-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178360305971474Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Internet is playing an great role in socioeconomic life. It becomes a novel public opinion platform for people to express popular will and participate in political, economic and social arenas. The appearance of personal website, forum, blogs and comments has speed the flow of information greatly and make people express views more effectually. Social blue book of Chinese Academy of Social Sciences, 2010 China Social Situation Analysis and Prediction, points out that the online media is becoming an important part of new public opinion pattern. The expression of hot issues reflects not only the public's participation in the great social public events, but also a variety of public value judgments and ideological trends. Moreover, the influence should not be underestimated. So, the government should form a mechanism to monitor, feed back and incorporate netizens' views. Therefore, views expressed consciously or unconsciously by netizens in network, and their value orientation on social hot point problems have more and more research value.E-government integrates modern government management concepts with the latest information technology. Through intelligent information processing, E-government uses artificial intelligence, data mining and management decision-making techniques to implement the assistant decision-making system. This is important to improve efficiency, enhance government's response and decision-making capacity, get more scientific and accurate decision-making, and make government a open, server and responsible one.K-means clustering algorithm used frequently in text clustering technology has been widely applied in practice. It has a high computing performance, a clear, global objective function. Its clustering process is simple, efficient and robust. It is suitable for many types of data. However, different applications and different types of data put forward different demands to K-means algorithm.How to extract available information which provides references for decision-making from comments about hot issues in portal nets is a worth researching problem. The key issue is to automatically classify comments. One effective method is to cluster news comments. In this thesis, K-means clustering algorithm and text clustering technology are chose to use. Surrounding comments clustering problem, some exploratory research has been done. The expectation is to get valuable viewpoint from news comments, in order to supply for government to absorbing, monitor popular will and make decision.In the implementation procedure of news comments clustering, vector space model for text representation is used. The original data is processed by chinese word segmentation, feature extraction, computing weights and so on, then, the data is turned into vectors which could be clustered. Next is to cluster data. Meantime, according to the characteristics of news comments and the main deficiencies of K-means algorithm, a special stop words list of news comments is built. Moreover, in the implementation of clustering algorithm, the feature extract, the selection of initial cluster center and the method arranged in categories have been improved. Finally, the clustering results and its influence factors have been analyzed. The ultimate clustering results and F-measure evaluation show that the research on K-means clustering method is effective.In the end, the research findings of news comments clustering is applied in news comments recommender system, the function of crawl and clustering news comments in the system has been implemented. Well-behaved clustering views set has been obtained, which provides a more meaningful comment points for the following comments recommendation.
Keywords/Search Tags:news comments, vector space model(VSM), clustering, K-means clustering algorithm
PDF Full Text Request
Related items