The Internet Information Statistics And Analysis Of Key Technology In The Study Of Ethnic Minorities

Posted on:2013-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Wang

Full Text:PDF

GTID:2248330374958126

Subject:Basic mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of the network, the internet has become mass information carrier. The appearances of search engines provide people better convenience when they surf the internet, also become an effective tool to study usersâ€™behavior. Accompanied by the rise of the network in recent years, the ethnic issue becomes a major obstacle to troubled Chinaâ€™s development, and its propagation is also becoming more and more prominent on the internet. It is turned into a major issue of network monitoring public opinion that how to use existing search engines to supervise the spread of ethnic issue at current time. This paper focus on the key technologies that faced by the information extraction of ethnic problems through the internet.Firstly, this paper introduces focused search engine and related principle and overview of development of key technologies. Then it highlights the common webpage classification algorithm, the key information extraction of webpage and fetches strategy. Lastly, it provides theoretical foundation for the algorithm and implementation of focused crawler that based on the search engine.The search engine has the incomparable advantage in the integration of network resources than other tools. But the result of search engine might not completely match the needs of users. And the amount of search information is obviously insufficient in some case. Thence, further search which based on the result of search engine have a certain value.Internet information appears in the form of HTML pages, and HTML has obviously characteristics of classification. The numerous information in page code is associated with the search information slightly, making it is extremely important to optimize the search mechanism of the page code. The purpose of search is explicit, making a search request, such as the common characteristics of the specific event page, with a clear structured. So we choose the vector space to simplify the page code, and design the algorithm based on the vector space model.Firstly, the model is divided into two modules, Baidu search module and focused search module. By the algorithm, Baidu search module grabs the URL information corresponding to the search results of the search word in Baidu search engine to get the corresponding initial URL queue. Focus search module uses this initial URL queue as a starting point, and then uses KNN classification algorithm to comply focused crawling search in the network based on the vector space model to get the search results.Finally, the article completes the initial implementation of the algorithm, and analyzes the results statistically. Analyzing the characteristics of the information contained in the search results and the events affecting network communication in the society, we get the match between search results and the sensitive source of information. Then the operability and effectiveness of the search results are proved and the data for the further optimizing of the algorithm are supported.

Keywords/Search Tags:

focused search, monitoring public opinion, webpageclassification, vector space model, KNN classification algorithm

PDF Full Text Request

Related items

1	Design And Realization Of An Internet Public Opinion Monitoring System
2	Research On Search Strategy And Key Techniques Of Focused Crawler
3	Research On Pivotal Technology Of Focused Search Engine
4	Research Of Focused Search Engine About Petroleum Subject
5	Research On Focused Crawler Based On SVM Classification Algorithm
6	Research Of The Discovery Algorithm About The Cyberspace Public Opinion's Hotspot
7	Research On Topic Detection And Tracking In Internet Public Opinion
8	Intelligent Search Technology Of Network Information Based On Military Application
9	Design And Implementation Of Public Opinion Monitoring And Analysis System For Special Equipment Accident And Fault Event
10	Research On The Topic Crawler Algorithm Based On Vector Space Model