Based On The K - Means Cluster Research And Realization Of The Web Information Retrieval

Posted on:2013-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:H M Huang

Full Text:PDF

GTID:2248330395451096

Subject:Computer technology

Abstract/Summary:

There are some new challenges and difficulties when the traditional technology of Information Retrieval (IR) applied to the Web Search area. The Web Search absorbed some advantages of the traditional IR technology, and applied some unique methods as well as provided some new research area and approaches for the IR.The approach of this thesis is that combine the web page parsing technology, pre-process the content of the aimed web page that is acquiring the tag node information in the web page, removing the stop words of the node information, counting the word frequency of the node and applying the word frequency statistics to the web pageâ€™s vector. Meanwhile, adopt the K-means algorithm of the data mining to cluster analysis the retrieval information results from the web and return the clustered results to the user. After the cluster analysis, some redundant information has been filtered from the original retrieval results and the user would be convenient to acquire their interesting information from the clustered results.The Eclipse IDE and Tomcat web server are used to implement the idea that mentioned in this thesis and the development framework is the Struts framework. The completed system include some key modules such as web information extraction module, information feature extraction and transform module, cluster analysis the feature information module, and cluster results presentation module, etc. The experiment results show that the approach is feasible in the application.

Keywords/Search Tags:

K-means, Cluster Analysis, Web Information Extraction

Related items

1	The Research On Fuzzy C-Means Cluster Analysis And Its Applications
2	Class Equality Cluster Validity Index And Cluster Filter K-Means Algorithm
3	Research And Application Of Improved K-means Algorithm In Multivariate Analysis System
4	Research And Application Of K-means Clustering Algorithm
5	Differentially Private K-means Clustering
6	Research Of Improved K-means Algorithm And New Cluster Validity Index In Cluster Analysis
7	Cluster Analysis And It's Application In Student Information Management System
8	The Research And Application Of Cluster Analysis On The Public Security Specialist Examination Analyze System
9	The Application Of Data Mining In Comprehensive Assessment Of National Area
10	Research On The Evolution Path Of Technology Based On Unstructured Patent Data