Font Size: a A A

Statistical And Rule-based Feature Weight Calculation Method Research And Application

Posted on:2012-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YangFull Text:PDF
GTID:2208330332490572Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology and the widespread use of internet, high-data-rate wireless transmission brought by the broadband mobiles, new data storage manners produced by cloud computing and new technique revolutions such as end-to-end transmission in the Internet of Things (IOT), the humans are pushed to an information era of abundance. However, facing the vast multitude of electronic information, people appear to be lost, how to acquire information people need is the key problem to be solved. Information filtering emerges as required, and feature weight computing is the basis and guarantee of information filtering. This paper mainly researches the selection of feature items, feature selection method and term weighting approach, it will contribute certain theoretical values as well as practical significance. The main contents include:1. Separate feature selection from feature weightingThis paper summarizes the exsiting feature selection functions and weight evaluation functions, separates and compares the two and gives definition from concept and meaning.2. Improve the traditional information gain (IG) algorithmThis paper discusses the advantages and disadvantages of traditional IG algorithms, due to the drawback that it is difficult to deal with imbalance corpus, this paper improves the IG algorithm by utilizing the distribution information of feature items, distribution information among classes and distribution information inside a class are used as judgment conditions. Combining the improved IG algorithm with the traditional one not only keeps the advantage of traditional IG algorithm, but also overcomes the drawback that it is difficult to deal with imbalance corpus. Experiment proves that this method is feasible.3. Introduce syntax rules into VSM to increase the semantic of feature itemsAfter analyzing the experimental result in-depth, the bottleneck of lexical system is found to be the important reason for information filtering precision decreased. So, researches thoroughly on text indexing and the selection of feature items'granularity in VSM-model, due to the insufficient of the current lexical system, syntax rules are introduced into VSM. Utilize syntax rules to construct lexical merging rules, identify basic phrases in the text and replace items in VSM with these phrases to increase the semantic description of items. Experimental results show its effectiveness.4. Construct feature relational tree, enhance the relations between items in VSMItems in VSM are independent, so it will produce collocation ambiguity and classification noise. Therefore, tree model is introduced to VSM to build feature relational tree, this idea is brought into undesirable information filtering. The result is favorable.5. Format a feature weighting method based on statistic and regulationDue to the defect that the classification curve of frequency-based method is smooth, this paper considers the grammatical role, position and distribution of features, constructs a new evaluation function due to the defect that frequency-based method is difficult to distinguish items in VSM. Theoretical analysis and experimental results prove its feasible.6. Design and realize information filtering system based on statistic and regulation In the light of advancement, reliability and usability, a filtering system based on statistic and regulation is designed. This system could automatically filter the information flowing through our computers, and realize URL filtering, key words filtering and content filtering according to users'setting. Where content filtering selects features by using feature selection algorithm improved in this paper, text representation model based on statistic and regulation proposed in this paper is utilized to reduce dimension of vector space, the precision of information filtering is enhanced.
Keywords/Search Tags:Network information filtering, information gain, text granularity, statistic and regulation, syntax rules, basic phrases
PDF Full Text Request
Related items