Font Size: a A A

Improved K-Nearest Neighbor Arithmetics On The Web Page Classification

Posted on:2011-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:F BaiFull Text:PDF
GTID:2178360305472737Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of modern communications technology and the popularization of Internet, World Wide Web has become the most significant, the most widely distributed information service center, since 2001. Currently more than 3 billion pages online, the speed of every day millions of new pages has been increase. For these massive explosive growth of network information, how to get effective access to useful, interesting information is an important issue for the modern information study. To solve this problem, combined with the heterogeneous, unstructured feature of Web information and the traditional data mining techniques.We form a new challenging issue, namely, Web mining. Its purpose is to find and analyse useful information in the WWW.In this paper, the main work is to study the classification algorithm in the Web mining. The work includes the following:1. The paper briefly introduces the background of text classification study and the present conditions of today's then explains algorithm idea commonly used in text classification, including the Naive Bayes Algorithm, Support Vector Machines, Decision Tree and K Nearest Neighbor Algorithm.2. Some key technologies are briefly introduced for Web text classification needed:web content extraction, word segmentation, stop-word processing and automatically obtaining features techniques.3. It anslyses the advantages and disadvantages of the K nearest neighbor algorithm, For lazy shortcomings of K Nearest Neighbor Algorithm,we propose two modified algorithms. According to the traditional formula for calculating the similarity of the defects for web page classification, it puts forward an improved similarity measure.Through the experiment, We implement an improved K-nearest Neighbor Algorithm and the traditional K-nearest Neighbor Algorithm and find that its has improved on their own focus. The improved algorithm can be effectively applied to Web data mining, information retrieval and other application areas.
Keywords/Search Tags:KNN algorithm, similarity, web page classification
PDF Full Text Request
Related items