Font Size: a A A

Data Mining Research In Web Information Retrieval And Classification

Posted on:2002-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:1118360185995614Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the phenomenon growth of the Web, it is increasingly becoming the single most important source of information. Retrieval and classify useful information from the Web has been an important application ever since the beginning of the Web, which has become the research focus in these years. The goal of the paper is to use data mining (text mining and web mining) technique to improve the performance of retrieval and clssification system. On the other hand, the thesis explores and presents some research results in knowledge representation of a web page, similarity measure, data mining and utilization of large-scale data, and retrieval and classification algorithms.The contributions of this dissertation are as follows:(1) A text mining method applying to acquire Part of Speech rules in Chinese text A text mining method to acquire Part of Speech rules in Chinese text is proposed. Given two level structures of words and Parts of Speech in text, the algorithm can extract a set of production rules. We adopt induction learning to get rules that can be used in more general situations based on the original association rule set. The experiment shows a system that incorporates statistic method with rule method has better performance in tagging words.(2) User interesting mining and discoveryA learning algorithm to identify user's interesting is put forward. This method can classify keywords and calculate the interesting degree of keywords only by a little user interaction. So we can get create an original describing file of user's interest. The file is the base of identification. Through an identifying algorithm,we can judge the interest degree of any title and provide better personal services. In order to track the change of user's interesting, we adapt the agent technique to sense the user's conduct, such as stay time, visiting times, save, edit, and modify etc. All these acts and user's query are considered as the factors to update the user's interesting file.(3)Web Search Based on Page Segmentation...
Keywords/Search Tags:data mining, text mining, WEB mining, information retrieval, text classification, Web page classification, clustering, part of speech tagging, user interesting mining, web page segment, concept semantic space, support vector machines
PDF Full Text Request
Related items