Font Size: a A A

Researchs On Intelligent Information Retrieval Based On Internet

Posted on:2003-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:S X FuFull Text:PDF
GTID:2168360065964116Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development of Internet Intranet,especially WWW,the network,which provides a sound platform for information and resource sharing,has become the largest distributive information warehouse all over the world. But Internet users find that it is becoming more and more difficult to search and gain information available,as a large amount of information rushes onto the Internet such that a lot of problems,such as "information misnavigation" and "information overloading",emerge. Search engines,serving as the retrieval tool of Internet information,are facing extreme challenge. This thesis focuses on the profound research of the crucial technology concerning with the intelligent information retrieval tools,and theoretically puts the emphasis on the research about the problems and their resolvings that have to be confronted during the development of the new generation of information retrieval system.The research content and the prominent achievement are as follows:(1) Pointing out the shortcomings through analyzing the present situation about the tools of information retrieval.By entirely retrospecting the tools and technology of the internet retrieval,the main existing problems and limitations of present retrieval tools are summarized,including:unscientific retrieval means,unreasonable retrieval methods,singleness of result displaying,weak capability for personalization,low intelligent function,and so on. While some characteristics of intelligence retrieval system are also analyzed,which it not only can comprehend the information,but also the users,it is pointed out that the retrieval method is characterized not by simple string match,but concept retrieval as its core.(2) Analyzing the system structure to search engines,and promoting a new frame structure of intelligent search engine.Based on the intelligent system this thesis puts forward a new system structure to search engines that enlarges the domain knowledge-base and users knowledge-base and enhances the function of user interface,and indicates that the veritable intellectualization of search engine must be supported by knowledge-base.(3) Presenting search algorithm and an updating strategy for the network information.Robot program is the basis for information searching and updating. Exclusion standard,search strategy and design of the search algorithm are given in detail. A latest update strategy,which is able to examine,download and renew the information in the shortest time,is proposed. The invalid hyperlinks are decreased sharply and the retrieval performance of search engines is improved.(4) Breaking through the restraints of the index method based on key words,putting forward a index method so call "attribute + content + structure",and presenting the retrieval language about attribute,content,and structure.Now,the keyword method does not meet the need of semantic understandings,Half structure and nonstructural network information makes it very difficult to organize and index the information. This thesis advocates organizing the attribute,content,and structure of documents in order to provide the multi-interface,multi-angle search. Meanwhile the method to extract the document attribute,content,and structure,and relevant query language are given.(5) Analyzing the combination pattern of Chinese and giving a method of extracting words without dictionary.Compared with English,Chinese words must be extracted during indexing information. At present,themechanical technique based on dictionary for extracting words is unable to resolve the problem of unregisteration and ambiguity which causes unsatisfied results;the method based on grammar and rules is so obscure and complex that it can not be put into practice till now. This thesis gives an method of extracting words without dictionary,which avoids the limitations of linguistics and syntax,showing its merit on extracting the middle and high frequency words and partially resolving the problems brought by the new words by the way of counting and filtering the frequency...
Keywords/Search Tags:intelligent information index, personalization, automatic words extraction, search engine
PDF Full Text Request
Related items