Font Size: a A A

The Research And Application On Text Similarity Measurement Based On Semantic Analysis

Posted on:2018-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:2348330542970087Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text similarity calculation is based on the content of the text,grammar,structure and other factors analysis,the establishment of algorithm model to calculate the similarity between texts is the key technology of text information processing.At present,text similarity calculation has been widely applied in many fields such as intelligent retrieval,automatic question answering,text checking and so on.Partial text similarity calculation models only analyze the similarity of texts from the statistical meaning,and take insufficient consideration of the semantic relations inside words.In response to this problem,this paper focuses on how to use the semantic information contained in the knowledge graph to measure text similarity and apply it to the field of text retrieval,the specific work as follows:(1)Optimized the semantic similarity model based on word2 vec.Firstly,weighting the word of text from word frequency,part of speech and position,and reducing the effect of word frequency on similarity calculation in Value Stream Mapping(VSM).Secondly,introducing the Skip-Gram model into word2 vec to learn the semantics and grammar of similar terms from the semantic level,and then learn the word vectors which based on the semantic analysis to measure the textual similarity.Finally,comparing with the VSM model and the known network semantic model,the experimental results shows that in the best case,our method is nearly 3 times than the semantic model in the execution time and 44 % accuracy than the VSM model.(2)On the basis of semantic understanding,a Chinese knowledge graph in the nuclear field was constructed.After the preprocessing of bulk downloaded texts,semantic annotation of the bibliographic section of each press release is carried out to acquire the concepts,attributes and relations among the entities in the nuclear field and to realize the construction of a knowledge map of the nuclear field.(3)The text similarity measure based on knowledge graph is studied.For a knowledge map that contains many entities and entities,it is the key to improve the accuracy of text similarity by judging the similarity between entities.In this paper,we use a common calculation method of entity similarity to measure the similarity between entities by cleaning the noise data,and to improve the accuracy of text similarity calculation from the different attribute values which entity contains.(4)By combining the knowledge graph and the optimized semantic similarity calculation method,a text similarity model in the nuclear field is constructed,and a retrieval system of related information in the nuclear field is realized.When entering the search terms,similar search terms are given according to the knowledge map.With the semantic relationship of the knowledge map,the search engine can understand the user search requirement to a certain extent.Compared with the traditional database retrieval and inverted index retrieval,the system filters irrelevant text information to the retrieval conditions with a certain extent,it realizes the semantic retrieval function and has certain application value to the information retrieval service.
Keywords/Search Tags:text similarity, knowledge map, entity attributes, similarity computing
PDF Full Text Request
Related items