Font Size: a A A

Research On Search Technology Of Chinese Information

Posted on:2006-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z R MaFull Text:PDF
GTID:2168360155961503Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology, we have come into an informatization society, which information is increased as geometric progression. It is impendence that how to search effectively information for satisfying people's needs for information. And existing information occupy most by text document, this paper mainly research to search classifying of the text document.The word frequency statistic is the foundation of information treatment. This paper mainly researches the method of counting word frequency, especially multiple keywords. An algorithm for counting multiple keywords frequency is designed in this paper. In this algorithm, for taking full using of the redundant information between keywords, the set of keywords is stored with search tree. Scanning the file once is able to get the frequency of all keywords. This method has realized the matching high-efficiently of many keywords. Using the method can parse Chinesetext indirectly also, and can get each prefix information for keyword. This method has reduced the cost of repeating of BF algorithm, KMP algorithm and BM algorithm.According to word frequency statistical results, the distributing rules of word frequency in Chinese information is analysed in this paper. Zipf s law is a word frequency experience law of the distributing for English text. Through experiment result, Chinese information accord with the experience formula of the word frequency distributing rules too, namely the high-frequency part accord with Zipf s law, the low frequency part accord with Booth's law.This paper researches the principles and methods of several information retrieval techniques, such as boolean logic model, vector space model, probabilistic reasoning model. Through experiment result, the merit and shortcoming of each method is compared and analyzed. Based on vectorial space model, BP neural network is introduced to study the mapping relation between the document and classification. The performance of search model is improved in this.The building, studying and reasoning technology of bayesian network are researched in this paper. According to causality of document and classification, a new model is established to do research on information retrieval. Demand information of user is expressed in a comprehensible way in this model. It makes use of probability reasoning to search...
Keywords/Search Tags:word frequency statistic, word frequency distribution, information retrieval, bayesian network
PDF Full Text Request
Related items