Font Size: a A A

Text Feature Extraction And Learning Based On Realtionship

Posted on:2015-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:L Y HongFull Text:PDF
GTID:2298330467463040Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Mainstream text representation model is based on vector space model (VSM), words and TF-IDF as features to characterize text. To a large extent this model could be used to describe the text feature, but at the same time it ignores the semantic contextual text, composition and other characteristics before and after the order of elements and so on, which makes the amount of information that can be expressed exist an upper limit, the subsequent data mining process will reach the human analysis capabilities. This is the text of the difficulty of modeling.WAF (word affinity force) is proposed by Professor Guo Jun of Beijing University of Posts and Telecommunications Laboratory of Pattern Recognition based on a description of the proposed statistical relationships between words algorithm, WAF is not a simple association between the word consider, also consider the distance around the word’s order, between words, including the probability and amount of information in two language rules. The subsequent data mining process will never reach the human analysis capabilities. The main innovations include three points:Firstly, propose a new term relationship feature based on WAF model, named active edge feature. In the current term frequency weighting on the basis of TF-IDF, join an active edge features that are obtained from the in-link and out-link of WAF, use Bayesian classifier and SVM classifier to classify. Finally, analysis and discuss the results.Secondly, propose entity feature extraction algorithms based on structured data and information extraction, themes for the use of the target entity subject extraction and classification model, the following information for each topic extracted corresponding structural features, WAF value to calculate the similarity of each feature to the entity clustering.Thirdly, propose a feature-based entity representation model based on WAF affinity. Through WAF algorithm calculate the value of the entity’s affinity, from this get one-dimensional feature vector entities, then use the cosine similarity to calculate the similarity between entities and entities, and by hierarchical clustering algorithm for clustering entity clustering, finally get entity cluster relationship graph. The above model in COSE system of teachers in entity relationship mining has obtained the good effect.
Keywords/Search Tags:text model, feature extraction, word activation force, vector space model
PDF Full Text Request
Related items