Font Size: a A A

Research And Implementation Of Bayesian Classifier Based Themecrawler

Posted on:2016-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:L HanFull Text:PDF
GTID:2308330503950652Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the search engine market, user function on the search engine and the search content will have higher requirements. It is necessary for the search engine to provide a more professional information service. The use of the Web document classification technology can be based on user defined target theme, is helpful to the effective screening and management of Web resources and improve the efficiency of information retrieval. It has become one of the hot research topic crawler.Based on the bias classifier, this paper carries out the research on the topic crawler, the main work includes the following.Firstly, by analyzing the working principle of the theme crawler, the theme crawler structure of the function, the organizational structure of the HTML Webpage, this paper design the Webpage link extraction scheme, method of topic similar calculation model.Secondly, through the research on the simple principle of Bias and its simple classification algorithm, the system set the appropriate smmothing factor and construct the classifier about theme of finance, sports and cars.Thirdly, this paper study on the treatment of web page related technologies such as Chinese segmentation, word frequency statistics, feature selection and link extraction to achieve the focused crawler system.Forth, this system realizes the bayesian based theme crawler by establishing the HTTP request, the text extraction, Chinese segmentation, feature selection and classifier and using the matic analysis and the integration of muti theading technology.This paper discusses a method of using the na?ve Bias to realize the design of theme crawler and the requirements of the subject. Through using the constructed classifier of the finace, sports and cars and a large amout of data acquisition web page from some web portals, this system test has good effect and accomplish the poject requirements.
Keywords/Search Tags:word segmentation, feature vector, focused crawler, bias classifier
PDF Full Text Request
Related items