Font Size: a A A

The Research Of Chinese Web Page Classification Based On Formal Concept Analysis

Posted on:2012-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:W F ChenFull Text:PDF
GTID:2178330335953080Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, as more and more information arises on the web, Internet has become a huge database for people to obtain information. However, because the complex of information on the Internet, it is easy to search some similar and irrelevant information when we do search work. These irrelevant information seriously affects the effect of our searching the accurate information. As a result, how to make us to obtain necessary information quickly and accurately from the Internet would become an inevitable research trend.In order to get information on the Internet conveniently, researchers propose search engine. It is true that Search engine makes us get knowledge more easily. However, most of results from search engines are huge as well as not accord with search intention of user. To solve this problem, the researchers propose classification method through deeply exploration and study. They classify the huge results from Internet into the appropriate fields; when the users search necessary information in correspondent field, search engine will return the correct results quickly and accurately. Therefore, the classification method has become an important part of the search engine and data mining technology.Before the advent of the World Wide Web, classification technology is generally applied into the field of documents classification. At the same time, there have appeared many related technologies for document classification, such as ATC etc. As the development of networks, web pages appear. Web as an information carrier, is closely related to human lives. Web page classification as a major search engine related technology is widely used in information retrieval, subject search, keyword search, digital libraries and other fields .So far, there have been several classification methods. But efficiency and accuracy of many Chinese web page classification method is not satisfied. To improve the situation of Chinese Web page classification, considering the basic knowledge of formal concept analysis, we propose KNN-based classification based on concept lattice. The main idea of the method is classifying the web pages after clustering them and this can make the clustering more accurately. The category concepts are defined as the concepts selected from the concept lattice for classifying in the method. In this paper the clustering based on concept lattice can be treated as the first classification. The process of the second classification is as follows: the selected category concepts are classified and then we establish the vector space model in which each category concept is corresponding to one column, and the attributes of the category concept are corresponding to one row. Meanwhile, the web pages to be classified are represented as vectors. Then KNN classification algorithm is used to achieve the classification of Chinese web pages. We should deal with two problems in the combination. They are: (1) the selection of the feature item. (2) the extraction of the category concept.Through the KNN classification method based on concept, this paper not only reduces the dimension of vector space resulting in increasing classification efficiency, but also improves the precision and recall ratio of Web classification.
Keywords/Search Tags:Page Classification, Category Concept, Concept Lattice, Formal Concept Analysis
PDF Full Text Request
Related items