System Design And Implementation Based On Crawler And Text Clustering For Network Public Opinion Analysis

Posted on:2015-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2308330473959571

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the modern Internet technology, WWW as one of the core carriers of information, the information of WWW is rapid growth that could provide a rich source of information for people searching and sharing. However, the effective and accurate information to get is becoming more and more difficult with the increasing amount of information, and the confidence level of information is also hard to guarantee. Therefore, there has a variety of search engine technology, such as Google, Baidu, and Hotbot etc. It could meet people’s requirement for searching information to a large extent. However, it is a very important to identify the user’s search intention and to find the information that the user really need with different user preferences and different user background for the design of search engine system.The application of public opinion analysis is studied based on personalized search system on the basis of existing search system in this thesis. The content of this thesis including the following: the key tech with the user interest model, web information crawling, web information and mining, text clustering and classification techs are introduced respectively. Then the system framework and each module are designed. Finally, the application of system in online public opinion analysis search through the personalized search system, the key technology of public opinion analysis were studied, and the performance test and analysis results of the system are given. The main work of this thesis is summarized as following:1) A framework of mining public opinion hotspot information is designed through the integration of web crawler, text classification and clustering, and index technology. The effective information is obtained by through information filtering and information renewal. Then the user interest model is build by using text clustering tech and the user feedback. The framework can provides a good reference for designing the personalized search system.2) Due to the traditional K-Means clustering algorithm is more sensitive to initialization and easy to fall into local minima, therefore an improved clustering algorithm based on vector space model K-Means is presented. The algorithm improves the accuracy of text clustering and classification, and can effectively solve the problem of initialization and local minima problem, and also improves the efficiency of text mining.

Keywords/Search Tags:

web crawler, text mining, clustering analysis, user interest model

PDF Full Text Request

Related items

1	Clustering Based Net User Interest Mining
2	Research And Implementation Of Mining Implicit User Interest
3	User Demand Analysis Based On Text Mining
4	Research On Topic Web Crawler For Web Text Mining
5	Discovering User Interest On Twitter With A Hierarchical Clustering Model
6	Research Of Microblog User Interest Mining Based On Image-text Co-occurrence Data And Time Effect
7	The Research Of User's Interest Model Based On Web Log Mining
8	Research Of Text Mining Based On Rough Set Theory
9	Mining Users' Interests Based On Search Logs
10	The Research On The Application Of Web Log Mining Based On User Interest And Fuzzy Clustering