Font Size: a A A

Design And Implementation Of Suffix Tree Based Uyghur Web Page Clustering Algorithm

Posted on:2012-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:M T Y M H S M MaiFull Text:PDF
GTID:2178330335486110Subject:Computer technology and applications
Abstract/Summary:PDF Full Text Request
Web search results clustering are navigator for users to search results. It helps users to quickly locate the page to view. In order to improve Uyghur web page's search efficiency, select suffix tree clustering algorithm as a framework, explore and study the advantages and disadvantages of STC, Proposed a new method to clustering web search results which is query word as a center based on interactive suffix tree algorithm(QISTC). This algorithm first traversing and analyze of the key node's child node which is located in the second layer of the suffix tree and key word as a node's lable, select and merge base class. And then traversing and analyze other node's child node which is located in the second layer, select and merge base class. After finish traversing and analyze the second layer's nodes return the results to the users. When the user selects a particular class, display user the class's content, meanwhile further clustering create current class's sub classes. During the interactive process it can create more concrete class and it's label more longer. Taking in to account due to the stemming which is adopt a pretreatment process lead to there is no semantic connection between words in class label which is extract during the clustering process. Propose a method to revert the semantic connection between words in class labels.The QISTC's advantages are that ability to extract more classes which are related to the query words. The length of the shortest class labels is equal to 2。During the interactive process it can extract more concrete sub classes. Taking into account suffix tree's main characteristic that web pages share a internal node are also share the a common path from a internal node to the root. Base class select from the nodes which are have parent-child friendship, is not extract from single node like STC. So the class labels have more semantic information than STC.
Keywords/Search Tags:Text Clustering, Suffix Tree Clustering (STC), Select Base class, Merge Base Class, Cluster Label
PDF Full Text Request
Related items