Font Size: a A A

Research On Text Similarity Measure Method Of Combining New Word Analysis And Semantic Analysis

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:C C LiuFull Text:PDF
GTID:2348330548950414Subject:Information Science
Abstract/Summary:PDF Full Text Request
Advances in technology and social development have made the Internet be an indispensable item in people's daily life.And then the wave of Internet has also made people experience substantial convenience and efficiency in their work and life.With its technological advantages,the Internet relies on a variety of carriers for information delivery,and allowing people to become a connecting point in Internet social groups.In such an interconnected world,the pursuit of information transmission is efficient and accurate.The development of hardware and infrastructure technologies is a good solution to the efficiency of the internet.The accuracy is not only dependent on the progress of the basic technology,but also in the algorithm and thought,and text similarity calculation is a breakthrough and optimization point of the algorithm branch,so it has always been hotspot and difficulty of research.The purpose of this paper is to find ways to improve the similarity of texts based on the research results of previous researchers.The existing research results are basically divided into two methods: a single cosine similarity method and a method of cosine similarity and semantic similarity.By comparing the two methods,it is found the results achieved by the method of combining the cosine similarity and the semantic similarity are more reasonable and accurate.Cosine similarity calculation is relatively simple,while the semantic similarity is more complex,because it relys on the semantic network or ontology,and HowNet is an authoritative knowledge base which it depends on.In this paper,we take HowNet as the research object and analyze its construction principle and structure.Then we find that the depth and the local density of the Yiyuan hierarchical tree affect the result of semantic similarity.Therefore,on the basis of considering the distance of the Yiyuan,the depth and local density of the Yiyuan tree are included in the formula for similarity calculation of Yiyuan.On the other hand,determine the concept of unregistered words in the HowNet according to the structure and semantic description of HowNet,and then calculate the similarity between it and other words.At last,the rationality and operability of the proposed method are verified by experiments,which provides a reference for the optimization of text similarity algorithm and the use of application areas.
Keywords/Search Tags:text similarity, semantic similarity, VSM, HowNet, new words
PDF Full Text Request
Related items