Font Size: a A A

Research On The Method Of Semantic Similaritycalculation Of Short Texts Based On HowNet

Posted on:2018-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2348330536965900Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing development of public opinion analysis task analysis of large data requirements increasing,put forward new requirements on Chinese text information,especially the short text and the rapid development of the Chinese short text similarity has become the big data era of information processing hot research.Chinese short text has the characteristics of short text,small number of words,rich text semantics,complex text structure and so on.The traditional calculation method based on TF-IDF according to the common text of words and word frequency,word frequency and inverse document features such as document frequency to calculate the similarity between the texts;the traditional algorithm performed better in calculating the similarity between the long text,but poor effect of short text similarity calculation,the reason that is a long text length,contains the number of words more,and some words appear frequently,while the short text length short,contains the number of words is less,and the two calculation methods in the calculation process,without considering the characteristics of Chinese language such as grammatical structure,polysemy and so the similarity of short text computing,its results are not satisfactory.To solve the above problems,based on the analysis of the characteristics of Chinese words and short text on,through a number of important feature selection Chinese words,the short text construction calculation model,a calculation method was put forward and a short text similarity calculation method based on complex network short text similarity based on semantic and syntactic structure.The first algorithm to compute the semantic similarity of words Chinese,Chinese words described in HowNet the original meaning of path length,depth,density and other important lexical entry emotional feature extraction in the process of calculation,and the Chinese word word meaning disambiguation,then the syntactic structure of Chinese sentence analysis,the calculation of topic similarity and syntactic structure similarity Chinese statement,finally two yuan calculated by short text similarity law.The second algorithm of short text preprocessing Chinese after the establishment of complex network model,the complex network characteristics and calculate the value of each node,and the characteristics of short text value parameter calculation as short text similarity,and then calculate the similarity of words,and word similarity values as elements of a vector of short text.The calculation of the vector cosine similarity,the similarity calculated according to the definition of short text similarity of short text.The simulation experiment of the proposed algorithm is compared with other algorithms,through the analysis of simulation data,results show that the accuracy of the similarity algorithm in short text computing and F-measure have improved.
Keywords/Search Tags:Short text, Chinese words, emotion feature, complex network, vector, short text similarity
PDF Full Text Request
Related items