Font Size: a A A

Improved TF-IDF Feature Extraction Method Based On Ontology Relative Degree

Posted on:2012-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2178330338995493Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Traditional TF-IDF text feature extraction method is a method based on statistical theory. This method takes text feature as a separate unit, and determines the feature word of the text by counting the frequency of a word which appears in a text and the number of the texts which include this word and appear in the text set. Although this method can reduce the computation time to some extent, and simplify the steps of the text feature extraction, but this method have weak points, such as not considering the relationship between words, ignoring the words with low frequency which can express the content of the text, and so on. Because of the weak points, the accuracy of extracting text feature by this method is not high.Ontology has a good concept of hierarchy and support logical reasoning, and expresses the relationship between terms by the concept of hierarchical graph. In order to optimize the traditional TF-IDF text feature extraction method, this thesis introduces ontology into the text feature extraction method, And built two simple domain ontology for experiments, get a calculation method of ontology relative degree by improving the existing calculation method of semantic similarity and semantic correlation, calculate the ontology relative degree between two concepts of domain ontology.This thesis put forward an improved TF-IDF feature extraction method based on Ontology relative degree. The steps of improved method is that: First, construct the candidate feature set and non-candidate feature set using the traditional TF-IDF method; Secondly, extract the ontology relative term of the candidate feature words in the non-candidate feature set according to Domain Ontology; Thirdly, adjust the weights of the candidate feature word by the initial weights of the candidate feature words, the ontology relative degree between the ontology relative term and this candidate feature word, the number of the ontology relative terms and the weights of these ontology relative terms, and get a new weight order of candidate feature terms. This improved method makes up the weak points of the traditional TF-IDF feature extraction methods. Experiments show that the method is more effective.
Keywords/Search Tags:Feature extraction, TF-IDF, Ontology relative term, Ontology relative degree
PDF Full Text Request
Related items