Font Size: a A A

The Research Of Text Feature Selection Applied In Information Filtering System

Posted on:2011-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y QiuFull Text:PDF
GTID:2178360308465186Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, People pay more and more attention to obtain information from the internet, valuable information has become a new kind of resources. As the key technology of processing huge network information, the network information filtering technique can effectively solve the information disorderly phenomenon, convenient for user to locate the information which accurately needs. The feature selection method is one of hot spots of network information filtering research field, is also the primary coverage which this article studies.In view of the feature selection algorithm and related content which was used in our filtering system, this paper analyzed the background of network information filtering system, studied the key technologies used in the filtering system, then put forward a new method-- Using neural network model to represent text vector space. After analyzed the advantages and disadvantages of several feature selection method, this paper improved the mutually information method in order to overcome the shortcomings of this method. The improved method combined with a Genetic algorithm is proposed which we called MI-GA method. Comparing with previous method, our approach has higher accuracy. Finally, the MI-GA method is used in the network information filtering system, and obtained a good effect. The concrete research content is as follows:1. Using neural network model to represent text vector space, the purpose of the text dimension reduction is achieved better than other methods.The neural network model is refers to the text vector space model transforms as the neural network expression form. After cutting word that each word is a neuron, then inputs numerous neurons into the neural network to optimizes, after processed by the network intermediate level, the outputs is the most superior character subset, and then can reach the goal of dimension reduction. The neural network model quite appropriate for fields that carry on the inference according to the very complex text classification, and that needs to express between the event condition, the nature as well as the movement relations. The neural network already applied in many information science domains and so on, and it demonstrates the great potential and the broad application prospect.2. Improved the mutually information method, according to the shortcomings of the method.The advantages and disadvantages of mutual information are analyzed in detail. The shortcomings of mutual information is not considered the word frequency and is influenced greatly by the critical feature, resulting in the mutual information evaluation function often tend to choose the rare word. So we improved the mutual information method, comparing with previous method, our improved approach has higher accuracy.3. The improved mutual information method combined with a Genetic algorithm, and the MI-GA method is proposed.The improved mutual information method combined with a Genetic algorithm is proposed, that is MI-GA method. We used the MI-GA method to text classification in our experiments; the experimental result has achieved the anticipated target. The method obtained different degree enhancement in the recall, the rate of accuracy and the F1 measure. It proved our method can effectively ensure the text classification accuracy rate and the reduction.4. The improved method is used on the platform of network information filtering system, and then we carried on the experiment test. The experiments results show that this method is better than the other methods in accurate and recall-precision. Especially, the method which was used in higher characteristic dimension achieved satisfactory test results.
Keywords/Search Tags:information filtering, feature selection, mutual information, genetic algorithm, text categorization
PDF Full Text Request
Related items