Study Of Some Issues About Information Retrieval System On The Internet

Posted on:2004-03-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Tian

Full Text:PDF

GTID:2168360092993706

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the exponentially growing amount of information on the Internet, the informationretrieval systems on the Internet--search engines become the main tools to help peopleget information from the WWW. However, the search engines produce many problems. Currently it is imperative to resolve two problems. On is about efficiency. A search engine needs to deal with hundreds of millions of webs and has to communicate with thousands upon thousands clients. The other is about precision. It is very important that how to provide users what they want from so much information. So the paper mainly focuses on the efficiency of search engine and how to provide users what they want faster and more exactly.This paper analyzes the information systems from the angles of system structure, document feature representation and user profiles. This paper also includes the work as follows:1. Analying the information retrieval systems on the InternetChapter 1 and chapter 2 mainly analyze development and trend of information retrieval systems on the Internet, and gives introduction about key technology and relevant knowledge. Based on this, some deficiency of search engine is analyzed, such as inefficiency deprived from Client/Server structure and low veracity because of document feature representation in VSM.2. Raising distributed system structure based on CORBAAimed at inefficiency caused by traditional structure of Client/Server in search engine, this paper raises to import distributed object technology to fit current state of Internet. CORBA is a mature distributed technology. Therefore this paper designs a system structure for search engine system based on CORBA, and gives an analysis of its function and characteristic. Imported CORBA, the Client/Server structure of search engine becomes distributed structure with three layers. This system structure puts importance on distributed computation on Application Server, and then has good expansibility, opening feature, high computation speed, it also benefits to reduction of Server load and net delay. Besides, this distributed system structure can merge with traditional distributed system structure of search engine, and a kind of integrated system structure comes into being, which support not only distributed download and query on end of Data Server but also distributed computation on end of Application Server.3. Advancing a method to calculate weight of document feature item based on BP neuralnetThe key technologies of information retrieval system are the followings: representation of document and user query; query matching strategy; correlation calculation of matching result. Among them, document representation is the foundation of information retrieval technologies. Document representation includes document feature item extraction and weight calculation of document feature item. Aimed at scarcities of current means to calculate document feature item weight, using VSM for reference, this paper advances a method to calculate weight of document feature item based on BP neural net. BP neural net is applied broadly for its simple structure and working stabilization. BP neural net is often used in pattern recognizing and function approaching, hi Chapter 4, a suitable BP neural net is designed and trained. This trained BP neural net can output the weight of document feature item if its frequency is inputted, therefore the net can represent the document feature. From tests, this method is proved that it can applied practicably and simply, and have high precision.4. Advancing a new representation of user profiles and corresponding filtering algorithm based on Huffman treeHow to provide users what they really want faster and more exactly is the development focus of information retrieval systems. The solution lies in the obtainment and representation of user profiles. In Chapter 5, common representations of user profiles are introduced firstly, and then the representation based on Huffman tree is discussed. On the base of this representation, docum...

Keywords/Search Tags:

Internet, Intelligent Information Retrieval, CORBA, User Profiles, BP Neural Net, Document Feature, Huffman Tree, Vector Space Model (VSM), Search Engine

PDF Full Text Request

Related items

1	The Research And Implementation Of Petroleum And Chemistry Specialized Web Intelligent Information Retrieval System
2	Design And Implementation Of Based On Vector Space Model Of Local Search Engine
3	Intelligent Search Technology Of Network Information Based On Military Application
4	Research And Implementation On Intelligent Information Retrieval Based On Classification
5	Research And Implementation On Chinese Information Retrieval System Based On Structured Vector Space Model
6	Analysis And Research, Personalized Information Retrieval Based On User Interest
7	An Extended Research On Information Retrieval Model Based On Document Relation
8	Based On Personalized, Professional Network Of Oil And Information Retrieval Technology
9	A Study On Internet Information Retrieval And Developing Trend
10	Research On Key Techniques Of Intelligent Meta-search Engine