Font Size: a A A

Study Of Some Issues About Information Retrieval System On The Internet

Posted on:2004-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X TianFull Text:PDF
GTID:2168360092993706Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the exponentially growing amount of information on the Internet, the informationretrieval systems on the Internet--search engines become the main tools to help peopleget information from the WWW. However, the search engines produce many problems. Currently it is imperative to resolve two problems. On is about efficiency. A search engine needs to deal with hundreds of millions of webs and has to communicate with thousands upon thousands clients. The other is about precision. It is very important that how to provide users what they want from so much information. So the paper mainly focuses on the efficiency of search engine and how to provide users what they want faster and more exactly.This paper analyzes the information systems from the angles of system structure, document feature representation and user profiles. This paper also includes the work as follows:1. Analying the information retrieval systems on the InternetChapter 1 and chapter 2 mainly analyze development and trend of information retrieval systems on the Internet, and gives introduction about key technology and relevant knowledge. Based on this, some deficiency of search engine is analyzed, such as inefficiency deprived from Client/Server structure and low veracity because of document feature representation in VSM.2. Raising distributed system structure based on CORBAAimed at inefficiency caused by traditional structure of Client/Server in search engine, this paper raises to import distributed object technology to fit current state of Internet. CORBA is a mature distributed technology. Therefore this paper designs a system structure for search engine system based on CORBA, and gives an analysis of its function and characteristic. Imported CORBA, the Client/Server structure of search engine becomes distributed structure with three layers. This system structure puts importance on distributed computation on Application Server, and then has good expansibility, opening feature, high computation speed, it also benefits to reduction of Server load and net delay. Besides, this distributed system structure can merge with traditional distributed system structure of search engine, and a kind of integrated system structure comes into being, which support not only distributed download and query on end of Data Server but also distributed computation on end of Application Server.3. Advancing a method to calculate weight of document feature item based on BP neuralnetThe key technologies of information retrieval system are the followings: representation of document and user query; query matching strategy; correlation calculation of matching result. Among them, document representation is the foundation of information retrieval technologies. Document representation includes document feature item extraction and weight calculation of document feature item. Aimed at scarcities of current means to calculate document feature item weight, using VSM for reference, this paper advances a method to calculate weight of document feature item based on BP neural net. BP neural net is applied broadly for its simple structure and working stabilization. BP neural net is often used in pattern recognizing and function approaching, hi Chapter 4, a suitable BP neural net is designed and trained. This trained BP neural net can output the weight of document feature item if its frequency is inputted, therefore the net can represent the document feature. From tests, this method is proved that it can applied practicably and simply, and have high precision.4. Advancing a new representation of user profiles and corresponding filtering algorithm based on Huffman treeHow to provide users what they really want faster and more exactly is the development focus of information retrieval systems. The solution lies in the obtainment and representation of user profiles. In Chapter 5, common representations of user profiles are introduced firstly, and then the representation based on Huffman tree is discussed. On the base of this representation, docum...
Keywords/Search Tags:Internet, Intelligent Information Retrieval, CORBA, User Profiles, BP Neural Net, Document Feature, Huffman Tree, Vector Space Model (VSM), Search Engine
PDF Full Text Request
Related items