| With the rapid development of China’s economy and culture, China’s social structure has entered a sharp change stage of transformation. The huge network group and the social economic conflicts are intertwined, coupled with the promotion of the Internet technology, which makes the network group events in China in recent years showing the trends of larger number of scale and more complex background and themes. The frequent occurrence of network group events, has seriously affected the stability of the social order and the stability of the people, but also received the high attention of the relevant departments of the government.The key to deal with the network group events effectively is to get the topic information of the network group events in a fast and timely manner. Topic clustering is the main technology to achieve the topic, how to use the topic clustering technology to get the topic of network group events from the complex network information, has become a hot spot for many scholars at home and abroad.In this paper, the main research work of network group events is mainly based on the following two aspects:(1) Firstly, we discuss and improve the method of extracting the key words of news web pages. The main traditional text keyword extraction method based on word frequency characteristics of vocabulary, but web page texts differ with general texts in text form, so the traditional keyword extraction method to extract page text keywords effect is not ideal. On the basis of the keyword extraction method based on word frequency characteristics, we analyzed the text features of web pages, combined with part of POS feature, location features and word co-occurrence feature information, and give appropriate information to adjust the parameters of each feature, forming a combination of multiple features the word weight calculation formula, according to the formula we extract the keywords of news web page texts.(2) Secondly, for traditional text clustering algorithm based on VSM existing problems that data is high dimensional sparse and lack of semantic information, this paper presents a text clustering algorithm based on faceted classification and LSA. Faceted classification idea first introduced, the text feature vocabularies are divided into thematic facet and the descriptive facet, select the feature vocabulary theme construct including facet term-document matrix, reduce the matrix dimensions and sparse degree; then using latent semantic analysis(LSA) feature space projection method to high-dimensional to low latent semantic space the dimension, not only term-document matrix scale is further reduced, but also web text semantic information is better mined. In the end, we experiment the topic clustering using web text data sets, experiments show that the topic clustering algorithm results of network group events based on faceted classification and the LSA are more accurate and efficient. |