Font Size: a A A

The Hotspots Analysis Research Based On The Financial Ontology

Posted on:2013-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:2268330422957661Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The existing research of hotspots discovery has some shortcoming: the model oftext representation lacks of semantics information, there are too many dimensions in thetraditional vector space model and the problems of synonym and words of differentmeanings still exist. Therefore, the thesis introduces the financial ontology, which isorganized by the synonym set, and suggests replace the traditional morphologyeigenvectors of text representation model using the concept terms, and keep the primarymorphology Eigen terms which make more contribution to the category. In the process ofreplacement, different meanings words can be clear and definite based on the article’sbackground through looking up the hypernymy and hyponymy sets to achieve the goal ofdisambiguate the morphology Eigen terms. After the replacement, the synonym, whichbelongs to the same or similar sense, are combined. The weights of the replaced andcombined Eigen terms are adjusts using the computational formula of Eigen value basedon concepts, then select the Eigen terms according to the weights once again, in order toreduce the dimensions of Eigenvector and represent a mixed text representation modelbased on concepts and morphology, which is used to text cluster to mine the deepsemantics of texts. Then, the categories as the result of cluster would be calculated to theheat values, according to the heat values, ranking them and discover the hotspots.This thesis suggests a text cluster method based on mixed text representation model,which combines the ontology with the vector space model based on morphology. It couldreplace the morphology with sense and solve the problems of merging synonym sets anddisambiguate morphology. The suggested mixed vector space model based on semanticswould be used to k-means algorithm. The thesis describes specific method of structurefinancial domain ontology, and discusses the form algorithm of representing mixed textrepresentation model based on concepts and morphology and the steps of implementationof text cluster based on this semantics model at length. Then the heat evaluation modelbased on more dimensions and the building ideas are described. The experiments arecarries on the cluster based on the semantics mixed model and the cluster based onmorphology and the whole sense. The experimental results show the model based on thesemantics mixed model is efficient and superior and it can achieve higher purity andF-value. Through analog data, the second experiments are also carried on the built heatevaluation model and the frequently-used heat evaluation quota to compare to the heat values of results of cluster. According to the real hotspots the medium report, theexperimental results show the built heat evaluation model can reach higher precision andthe rate of coincidence. Based on the two improvements above, the suggested model andmethod could be help for improving the quality of hotspots discovery and assist netizensskim through and locate fast, and obtain the information what they want by rule and line.
Keywords/Search Tags:hotspots discovery, ontology, mixed vector space model, heatevaluation model, text representation, text cluster
PDF Full Text Request
Related items