Font Size: a A A

Chinese Search Results Clustering Research Based On Improved STC

Posted on:2014-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y RongFull Text:PDF
GTID:2248330398957033Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of science and technology, people are more closely with the network, network can make communication and sharing with others so convenient.The rapid growth of information on the network make users have to find information from the list of search engine results carefully, if the user input queries with ambiguity, is likely to see a lot of pages to find a satisfactory answer, this will bring users inconvenience. For example, search "Jaguar", the user may want to find a weapon, or a car, or is a kind of animal, but the results list of these types of information are all presented to the user mixed, if the user needs to search some information in detail, will need to turn many pages to find.Based on this, this paper designs the search result clustering system based on traditional search engine. System process include the three steps:First of all, to obtain the results’title and abstract of each result item by HTML analyzer returned by search engines, segment text with segmentation tool. POS tagging, record the location and frequency of every word, remove stop words, the remaining words of each result item come to be the keywords set. Then, construction a suffix tree with the keywords sets, make word into the suffix tree nodes, through the position, frequency, pos and word length several constraint conditions calculated each node word score. Finally, combine the base class and get the node with high score as labels. The experimental results show this method’s clusters with high purity, the extracted labels accurately and distinguish strongly, user-friendly.
Keywords/Search Tags:Search results clustering, Suffix tree, Cluster label, Chinese search, Clustering
PDF Full Text Request
Related items