Font Size: a A A

Research And Implementation Of Internet Public Opinion Information Mining

Posted on:2014-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:C H ShangFull Text:PDF
GTID:2268330425975927Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of network technologies, a doubling of the information on theinternet which makes it has no doubt that become the concentration of knowledge andinformation, it is also become a target place where people can find the information they need.As an important way for people to acquire knowledge and information, the internet providesconvenience to people, meanwhile, collects the people’s feedback for the information theyobtained. All sorts of feedback constitutes the public opinion of Internet information, Since theInternet has the characteristics of virtual and open, the Internet Public Opinion affects morebroadly, and also become a weather vane of Social Attitudes. So we can say that it is essentialto the study of Internet public opinion information when to analyze Internet information.This paper makes relatively deep discussion in the field of Internet public opinioninformation extraction and classification. Referencing the known research result of crawlerwhich appeared at public opinion analysis system, and analyzing some of the key technologiesof Web crawler in-depth. According to the requirements of this paper to achieve an optimizedweb crawler, implements the function of fetching Internet public opinion information. Thispaper analyzes the important role of the hot issues keywords among the public opinioninformation collection, put forward to through the hot issues keywords to find public opinioninformation, this method improves the precision and the efficiency of the public opinioninformation acquisition. In typical web crawler architecture to add anchor text matchingmodule which implements network public opinion effectively access to information.The main study contents are as follows:First, To analyze and summarize the characteristics and difficulties of the Internet publicopinion information mining technology, analyze the role of web crawler in this technology,Research on its goal and implementation method。Second, Analysis of the implementation of the general web crawler technology, researchon topic crawler technology、focused crawler technology, etc., and then on this basis we putforward to a kind of crawler implement goal which is fit for this system. Given the specificimplementation details of the crawler, including web crawling and parsing, web content’s getand to heavy, spiders crawl strategy, URLS to heavy, etc.Third, Analyzing the relations between web anchor text and web content, proposed andimplemented a method which is anchor text matching hot issues keywords. Researching on thematching problem of Chinese phrase. Further study of text classification technology, mainlyincluding text participle, text representation, feature selection and classification algorithm four parts. Using database to store page content, research information indexing and retrievaltechnology, the implementation of this technology can make users convenient to retrievalinformation which stored in the database.
Keywords/Search Tags:Internet Public Opinion, Information Collection, Web Crawler, Anchor Text, HotIssues
PDF Full Text Request
Related items