Font Size: a A A

The Research Of Web Technology Classification In Search Engine

Posted on:2012-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2178330338992285Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet technology, people have entered into the information era. In this information era, information means the wealth, how to obtain accurate and valuable information quickly has become a key link. As time goes on, there appeared a large number of information resources with different structure, and most of these resources exist in the form of Web texts, which contain a large number of valuable information for people, so how to extract useful information form the mass Web resources has become a question which need to be solved. The technology of Web text classification has been developed which based on the existing text classification theory and technology. It abandons the original artificial classification and saves lot of manpower, material resources, also it can effectively improve the speed of retrieval for users, and can classify the retrieval results accurately, it has become a hot research in the fiele of information orocessing.This paper introduces the background and status from domestic and abroad, and expounds the related theory and technology of text classification. It have a clear idea of how to solve the question which based on summarizing the relevant theoretical knowledge and the analysis of the structural features of Web page . The first step we use robot to collect Web page from Internet, extract the text information from Web page and then text information should have a preprocessing , converted to text format, finally, we construct a classifier and classify the Web text by classification algorithm. In this paper, proposeddenoisied method based on the block of information, combining the text frequency and CHI to select items, classifyied Web text by multiple classification of decisions SVM, and proposeied a design idea of classification search engine.It verified the theoretical method which proposed in this paper by experiment, the results show that the extraction of information and Web classification is more accurate.
Keywords/Search Tags:Information Extraction, Characteristic Selection, Text Categorization, SVM
PDF Full Text Request
Related items