Font Size: a A A

Research On Clustering Systems Of Search Engine Results

Posted on:2012-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:T XiFull Text:PDF
GTID:2218330338467970Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet information, search engine has become the most widely used network tools for people. However, the search engine result list is not comfortable for people to use which is big and linear. When people use search engine, they always browse the first few documents, so they can't find the required information. Therefore researchers introduced clustering into search engine to improve the defect. Because clustering is an unsupercised algorithm which is very flexible, it becomes an important tool to improve search engine.This thesis builds the clustering search engine system. The main algorithm is an advanced algorithm named ALingo which helps to achieve the purpose of shortened retrieval time of finding usefull informations. The main reseatch work is as follows:In text clustering, an accurate text preprocessing has become a precondition to improve the clustering effect. This thesis achieve pre-Chinese and English texts processing including symbol removal, Chinese word segmentation, English stemming, stor-word removal. After extracting feature from document, vector space model can turn the text into matrix, and build a good foundation for the document clustering.Some comparative experiments between Lingo and STC are done in this thesis. The results show that Lingo is better then STC. But one defect is that the class labels of Lingo can not exist in two sentences. So Apriori algorithm is introduced into discover frequent itemsets of the documents as search suggested keyword. Because the search suggested keyword can exist in two sentences, it can reflect that there are many topics in one text. That makes users know the topics of documents better and reduces search time.The clustering search engine system is based on Lucene search engine results. Experimental results show that the proposed algorithm will reduce searching time.
Keywords/Search Tags:Search engine, Clustering, Lingo, ALingo, Lucene
PDF Full Text Request
Related items