Font Size: a A A

Research On Semantics-Based Search Results Clustering Methods

Posted on:2015-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y DangFull Text:PDF
GTID:2298330467462085Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
More and more people get information via Internet along with the rapid development of Internet. Search engine is responsible for information access and retrieval as a transit point between users and Internet, which brings great convenience. However, with the growing amount of information on the Internet, the search results returned by search engine are also increasingly complex, contains a lot of irrelevant, repetitive, mixed content. Faced with this situation, people often take a long time to browse all the search results in order to find a satisfactory result. Therefore, some researchers use clustering in information retrieval area to organize search results and present them to users in category form. This method is known as search results clustering. Search results clustering takes advantage of the unsupervised machine learning method-clustering to give users a categorical navigation, according to "maximize the intra-class similarity and minimize inter-class similarity". Search results clustering target is short snippet, which is different from text clustering. Currently, search result clustering take snippets as separated words, which ignores the semantic association between the words and other semantic information. This way will bring serious lack of semantics.The major research in this paper is focus on the semantics loss of search result clustering. And, the main research contents include:search results preprocessing and modeling methods, the classic search result clustering methods, semantic-based search result clustering methods. More importantly, this paper proposes two improved search results clustering methods on the basis of deep research. One is called search results based clustering algorithm based-on OPTICS, and the other is called WordNet-based suffix tree clustering algorithm. These two algorithms both focus on the mining and use of semantic information of search results and make corresponding improvements, in order to improve the accuracy of search results clustering. Finally, clustering experiments are carried out on data set, aiming to analyze the two improved search results clustering methods’performance. Experimental result shows that this two improved search results clustering methods have higher accuracy rate and short the running time. It is beneficial to improve effectiveness and instantaneity for search results clustering.
Keywords/Search Tags:search results, clustering, semantic clustering, OPTICS, suffix tree
PDF Full Text Request
Related items