Font Size: a A A

Design And Implementation Of Based On Vector Space Model Of Local Search Engine

Posted on:2017-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X G MaoFull Text:PDF
GTID:2308330482990124Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent one century, the amount of information storage is expanding at a rapid pace and the format of files is getting richer since the human knowledge swelling at an unprecedented speed. Ordinary PC often contain hundreds of GB or even several TB of data. It is usual for user to spend a lot of time searching for the files which they are interested in. Rapid and accurate information retrieval in large and heterogeneous data has apparently become an urgent demand.At present, faced with mutual nesting, associated files and folders in the file system, most of the general operating systems provide a query tool based on matching file names, besides file system explorer which need manual searching by users. This querying way according to the principle of string matching makes a large amount of relevant and valuable documents difficult to be found because of not referring the text file which contains a lot of useful information. And, it especially doesn’t work well for the users who don’t have good habits of file management.The system which is based on Multilayer Vector Space Model has improved the weight calculation method, and also it combined with feedback technology based on user clicks on behavior and the semantic extension technique based on synonym dictionary. Thanks to the technologys above, the core of the system was builded.Based on the core system, the system provides both online and offline work mode via the custom protocol which based on WebSocket protocol. At present, after many tests,the system have been put into use, and it has a good performance.This paper mainly introduces the methods of design and implementation of information retrieval system. Especially we improved multilayer weight calculation method of the Vector Space Model(VSM) and we use the method which based on bayesian estimation vote ranking algorithms as well as relevant feedback technology based on user click behavior. In addition, we have made a thorough study and implementation on query semantic extension technology, the use of VSM in heterogeneous environment and dynamic data. Finally, this paper also tries to analyze and illustrate the effect, performance and improvement direction of the Multilayer Vector Space Model.
Keywords/Search Tags:Information retrieval, Real time information retrieval, Heterogeneous information retrieval, Vector Space Model, The semantic extension
PDF Full Text Request
Related items