Data Mining Research In Web Information Retrieval And Classification

Posted on:2002-09-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X L Li

Full Text:PDF

GTID:1118360185995614

Subject:Computer software and theory

Abstract/Summary:

With the phenomenon growth of the Web, it is increasingly becoming the single most important source of information. Retrieval and classify useful information from the Web has been an important application ever since the beginning of the Web, which has become the research focus in these years. The goal of the paper is to use data mining (text mining and web mining) technique to improve the performance of retrieval and clssification system. On the other hand, the thesis explores and presents some research results in knowledge representation of a web page, similarity measure, data mining and utilization of large-scale data, and retrieval and classification algorithms.The contributions of this dissertation are as follows:(1) A text mining method applying to acquire Part of Speech rules in Chinese text A text mining method to acquire Part of Speech rules in Chinese text is proposed. Given two level structures of words and Parts of Speech in text, the algorithm can extract a set of production rules. We adopt induction learning to get rules that can be used in more general situations based on the original association rule set. The experiment shows a system that incorporates statistic method with rule method has better performance in tagging words.(2) User interesting mining and discoveryA learning algorithm to identify user's interesting is put forward. This method can classify keywords and calculate the interesting degree of keywords only by a little user interaction. So we can get create an original describing file of user's interest. The file is the base of identification. Through an identifying algorithm,we can judge the interest degree of any title and provide better personal services. In order to track the change of user's interesting, we adapt the agent technique to sense the user's conduct, such as stay time, visiting times, save, edit, and modify etc. All these acts and user's query are considered as the factors to update the user's interesting file.(3)Web Search Based on Page Segmentation...

Keywords/Search Tags:

data mining, text mining, WEB mining, information retrieval, text classification, Web page classification, clustering, part of speech tagging, user interesting mining, web page segment, concept semantic space, support vector machines

Related items

1	Study On Key Techniques Of Web Mining For Intelligent Information Retrieval
2	Mining Users' Interests Based On Search Logs
3	Research On The Key Techniques Of Web Information Intelligent Acquisition
4	Research And Realization Of Page Clustering System Based On Web
5	Text Data Mining For Applied Research In Information Monitoring
6	Web-based Text Mining Svm Page Text Classification Research
7	WEB Mining System
8	Research Of Data Mining Based On Web Log
9	Text Mining And Its Application In Text Retrieval
10	Research Of Automatic Web Page Categorization And Cluster Based On Web Mining Technology