Font Size: a A A

Intelligent Classification Method Research On Information Retrieval

Posted on:2009-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:S B HuFull Text:PDF
GTID:2178360242494787Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continual development of internet technology, the information of internet is more and more rich, which has become an important resource of human acquiring information in daily life and work. While for the intrinsic openness and isomerism of internet, it is very difficult for user to exactly position their required information among numerous and complicated information, therefore, how to reasonably and affectively organize and manage internet information has become a very important research subject day by day in information processing realm. For the numerous information of internet, the traditional operation is to practice manual classification, organize and processing and offer people relatively effective information acquisition method. But, the manual classification has much disadvantage: one is the consumption of numerous manpower, material resources and financial resources. Two is the low consistency of classification result. Even classifying people has high language quality, different people classify, the classification result will be different; even same people who classify at different time, different result will appear. So the requisition upon intelligent classification on web is becoming more and more exigent.While researching the realization of traditional information retrieval technique, the document combines actual web classification technique, make more systematic researching on web intelligent classification. Under the basis, the document bring forwards some thought and opinion on web pretreatment, Chinese participle, feature selection and web classification in information retrieval intelligent classification.Main innovation in the document:1.Focusing on web structure character, the document analyzes the information factors that has contribution on classification and improves the effective method that automatically eliminate"noise"from Chinese web and extract text.2.The document makes research on the character of actual web source code, express web as tree hiberarchy, and endow different weight on every leaf's node; on the basis of traditional character word weight calculation formula, the document think over the length and appearing position of character word to bring forward character word weight calculation formula on the basis of web label tree hiberarchy.3.It introduces traditional character extraction arithmetic, and makes two improvements onχ~2 statistical magnitude formula on the basis ofχ~2 statistical magnitude.4.It researches actual web classification methods. KNN classification arithmetic, in order to find K text with shortest distance (most similar) to tested text, requires to search the total training collection. When there is higher training sample number or higher characteristic vector dimension, the calculation complexity is very high. Focus on the problem, the document bring forward K neighboring PSOKNN algorithm on intelligent and fast searching new text on the basis of algorithm of particle swarm.5.It makes appraisal on the testing result of 4 statistical magnitude of IG,MI,CHI,CHI*,Through experiment, it proves that the characteristic word extraction method adopted in the document can acquire higher classification accuracy rate and has certain rationality on certain degree.
Keywords/Search Tags:information retrieval, Chinese participle, feature selection, intelligent classification, KNN classification arithmetic
PDF Full Text Request
Related items