Font Size: a A A

System Design And Implementation Based On Crawler And Text Clustering For Network Public Opinion Analysis

Posted on:2015-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2308330473959571Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the modern Internet technology, WWW as one of the core carriers of information, the information of WWW is rapid growth that could provide a rich source of information for people searching and sharing. However, the effective and accurate information to get is becoming more and more difficult with the increasing amount of information, and the confidence level of information is also hard to guarantee. Therefore, there has a variety of search engine technology, such as Google, Baidu, and Hotbot etc. It could meet people’s requirement for searching information to a large extent. However, it is a very important to identify the user’s search intention and to find the information that the user really need with different user preferences and different user background for the design of search engine system.The application of public opinion analysis is studied based on personalized search system on the basis of existing search system in this thesis. The content of this thesis including the following: the key tech with the user interest model, web information crawling, web information and mining, text clustering and classification techs are introduced respectively. Then the system framework and each module are designed. Finally, the application of system in online public opinion analysis search through the personalized search system, the key technology of public opinion analysis were studied, and the performance test and analysis results of the system are given. The main work of this thesis is summarized as following:1) A framework of mining public opinion hotspot information is designed through the integration of web crawler, text classification and clustering, and index technology. The effective information is obtained by through information filtering and information renewal. Then the user interest model is build by using text clustering tech and the user feedback. The framework can provides a good reference for designing the personalized search system.2) Due to the traditional K-Means clustering algorithm is more sensitive to initialization and easy to fall into local minima, therefore an improved clustering algorithm based on vector space model K-Means is presented. The algorithm improves the accuracy of text clustering and classification, and can effectively solve the problem of initialization and local minima problem, and also improves the efficiency of text mining.
Keywords/Search Tags:web crawler, text mining, clustering analysis, user interest model
PDF Full Text Request
Related items