Font Size: a A A

Research And Design Of Personalized Search Engine System Based On Automatic Abstracting And User Feedback

Posted on:2013-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:M ShenFull Text:PDF
GTID:2268330392470609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In nowadays when information is becoming explosive, the search engine hasbecome an effective method for information discovery and reasoning knowledge fromlarge amounts of data. However, traditional search engine system has drawback of thesame query will return the same results for different users, and users increasinglyurgent want the system to be able to return the results of higher accuracy. Therefore,automatic abstracting and user feedback technology is introduced into traditionalsearch engine system to improve the precision of the system.This paper proposes a relatively complete personalized search engine systembased on the model of traditional search engine named MG to solve the aboveproblem. Through the system analysis and requirement analysis, the system is dividedinto nine modules. Firstly, system extracts the model of user interest by clusteringanalysis of users. Secondly, system calculates the similarity of the query vector anddocument vector. Thirdly, system adjusts personalized parameters based on userfeedback to make the results more accurate. System mainly improved the algorithm ofreduction of document features. At first, the document is reduced to digest byautomatic abstracting technology, followed by extracting the features of digest, andthen sorting the features according to the contribution of document categories. Finally,it is in exchange for rapid convergence of the features at the expense of completeness.The system also reduces the inverted file dictionary storage space and enhances theread speed of inverted file index by combining the minimal perfect hash function withlarge memory storage technology. It also optimizes the sorting space of massivedocument on the method of establishment of the minimum heap.Comparing to the system of MG, the time efficiency of feature reduction isincreased, and the inverted file index dictionary storage space is saved nearly half, andspace complexity of document sorting algorithm is improved. The most important isthe accuracy rate of the query is increased to a certain extent by using the improvedsimilarity calculation for personal interesting.
Keywords/Search Tags:Personalized Search Engine, Automatic Abstracting Technology, User Feedback, Document Vector, Inverted File Index
PDF Full Text Request
Related items