Font Size: a A A

Chinese Words Semantic Similarity Measure Research Based On Common Sense Knowledge Base

Posted on:2016-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:R C MaFull Text:PDF
GTID:2428330464453675Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word semantic similarity measure is the basic work of the computer science research,such as machine translation,question-answering system,intelligent tutoring,information retrieval and data mining.With a value to represent the semantic similarity between the two words,and then use this value to make further decisions to resolve word sense disambiguation,spelling error detection,replace the verb translation and other specific issues.Natural language understanding is one of the important research areas of artificial intelligence,natural language understanding the ultimate goal is to achieve a natural language interface,to enable them to understand and generate natural language.In natural language understanding,in order to let the machine understanding of human natural language,word semantic similarity measure is also one of indispensable basic work.Current research on this topic can be summed up in two different factions.One is based on a large-scale corpus,the use of probabilistic aspects of mathematical knowledge to transform the similarity of words;one is based on the knowledge base or ontology,use of side information in tree or mesh structure transformed into semantic distance measure words semantic similarity.Both methods have advantages and disadvantages,and theoretical assumptions based on not the same.The first kind of theoretical assumptions as follows:all semantic similar words,they should also be similar context.The second theory assumptions as follows:the closer distance between two semantic concept nodes,the greater the similarity of these two concepts,or the more common information and the less different information,the greater the similarity.There are still some scholars divided the study on different,also have combined with a variety of ideas of similarity methods,but has some limitations,practicability severely challenged by efficiency and different application fields.Based on this situation,I decided to make some exploration and research in this area.Research methods of this thesis belongs to the second type,the use of Tongyici CiLin extended edition and HowNet which released by Dong Zhendong as a knowledge base,to measure the similarity between words.Tongyici CiLin extended edition is issued by Information Retrieval Laboratory of Harbin Institute of Technology,base on Tongyici CiLin.Tongyici CiLin compiled by Mei Jaju et al in 1983.During the study the subject in this article to make a bit of work areas:First,Studied the psychological theory of "similarity",the concept of "word similarity",The main research idea category,as well as the current status of research in this subject.Outlines the word semantic similarity's applications and application prediction in different areas,analyzes the status of similarity calculation in artificial intelligence.Second,Introduced the construction purpose and development of Tongyici CiLin,From Harbin institute of natural language platform Tongyici CiLin extended version of the relevant documents,imported into the SQLServer database.Mainly using the semantic distance information obtained similarity,and also use the branch node quantity and branch interval information on word similarity value adjustments.Get better results compared to the existing method base on CiLin.Third,Study the basic concepts related Hownet,understand the major underlying file's(Yiyuan and Yixiang)structure and meanings,and imported into the SQLServer database.The main documents of HowNet and related documentation can be downloaded from the official website.According to a feature of the Yiyuan,using a monotonic decreasing curve with flat top and steep bottom edge weights strategy,improved the existing sememe similarity algorithm.Then divide the DEF into three sections to workout the similarity.Get better results compared to the existing method base on HowNet.Fourth,Combined with Tongyici CiLin and HowNet,synonym replacement and dynamic weighting strategy,Got a comprehensive word similarity measure method.Greatly raised the level of correlation based on a single knowledge base,but also expanded the scope of word to calculate,further enhance the practicability of the application of word semantic similarity calculation.
Keywords/Search Tags:Semantic Similarity, Tongyici CiLin, HowNet, Natural Language Processing
PDF Full Text Request
Related items