Font Size: a A A

The Internet Information Statistics And Analysis Of Key Technology In The Study Of Ethnic Minorities

Posted on:2013-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2248330374958126Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the network, the internet has become mass information carrier. The appearances of search engines provide people better convenience when they surf the internet, also become an effective tool to study users’behavior. Accompanied by the rise of the network in recent years, the ethnic issue becomes a major obstacle to troubled China’s development, and its propagation is also becoming more and more prominent on the internet. It is turned into a major issue of network monitoring public opinion that how to use existing search engines to supervise the spread of ethnic issue at current time. This paper focus on the key technologies that faced by the information extraction of ethnic problems through the internet.Firstly, this paper introduces focused search engine and related principle and overview of development of key technologies. Then it highlights the common webpage classification algorithm, the key information extraction of webpage and fetches strategy. Lastly, it provides theoretical foundation for the algorithm and implementation of focused crawler that based on the search engine.The search engine has the incomparable advantage in the integration of network resources than other tools. But the result of search engine might not completely match the needs of users. And the amount of search information is obviously insufficient in some case. Thence, further search which based on the result of search engine have a certain value.Internet information appears in the form of HTML pages, and HTML has obviously characteristics of classification. The numerous information in page code is associated with the search information slightly, making it is extremely important to optimize the search mechanism of the page code. The purpose of search is explicit, making a search request, such as the common characteristics of the specific event page, with a clear structured. So we choose the vector space to simplify the page code, and design the algorithm based on the vector space model.Firstly, the model is divided into two modules, Baidu search module and focused search module. By the algorithm, Baidu search module grabs the URL information corresponding to the search results of the search word in Baidu search engine to get the corresponding initial URL queue. Focus search module uses this initial URL queue as a starting point, and then uses KNN classification algorithm to comply focused crawling search in the network based on the vector space model to get the search results.Finally, the article completes the initial implementation of the algorithm, and analyzes the results statistically. Analyzing the characteristics of the information contained in the search results and the events affecting network communication in the society, we get the match between search results and the sensitive source of information. Then the operability and effectiveness of the search results are proved and the data for the further optimizing of the algorithm are supported.
Keywords/Search Tags:focused search, monitoring public opinion, webpageclassification, vector space model, KNN classification algorithm
PDF Full Text Request
Related items