Word Semantic Similarity Measurement Based On Uncertainty Theory

Posted on:2015-03-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J H Wang

Full Text:PDF

GTID:1268330428983002

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Since the birth of computer, teaching it to learn human language has made considerableprogress, and produced a new discipline--Natural Language Processing. It tries to developmodels representing the language skills and language applications, establish computingframeworks to implement such language models, propose corresponding ways to keepimproving these language models, design a variety of practical systems based on these languagemodels, as well as exploring the evaluation techniques of practical systems. The imprecision oflanguage brings challenges for computer language learning, also measuring semantic similaritybetween words has become one of the basic problems of natural language processing.Measuring semantic similarity between words is a classical and hot problem in naturelanguage processing, the achievement of which has great impact on many applications such asword sense disambiguation, machine translation, ontology mapping, computational linguistics,etc. So far, many approaches have been proposed for word semantic similarity measurementwhich can be grouped into two categories: knowledge-based and corpus-based methods.Corpus-based method is subject to the adopted corpus and cannot avoid data sparsenessproblem, while knowledge-based method is simple, effective, at the same time more intuitive.It does not need corpus for training, but is more impacted bypersonâ€™s subjective consciousness.This article attempts to explore methods for word semantic similarity measuring, which aresimple and effective, and are not subject to large-scale corpus. With the imprecision of language,we quantify the word semantic similarity based on Naive Bayes model, Subjective Bayesmethods, Evidence Theory, Certainty factor, Cloud Model and Fuzzy Sets to investigate thefeasibility of uncertainty theory used to measure the word semantic similarity.The main job of this paper is as follows:(1) Feature Extraction: Based on WordNet, itdefines and quantifies the distance between word pairs and the depth of word pairs with smallamount of computation and high degree of distinguishing characteristics for wordsâ€™ sense;analyzes the quantifiable of them for word semantic similarity measurement by scatter plot.(2)Defines mean functions by statistics and piecewise linear interpolation technique to describehuman judging word semantic similarity by word pair distance and depth respectively.(3)WordSemantic Similarity Measurement based on Na ve Bayesian Model: to start, Naive Bayesianmodel is given for word semantic similarity measurement; then, generate conditionalprobability distribution based on training data set automatically; after that, obtain posteriorithrough Bayesian inference; at last, quantify word semantic similarity.(4) Word SemanticSimilarity Measurement based on Subjective Bayes Methods: to start, define rules and generatesufficiency measurement of rules; then, obtain comprehensive posteriori by integrating uncertainty reasoning with conclusion uncertainty synthetic strategy; finally, we quantify wordsemantic similarity.(5) Word Semantic Similarity Measurement based on Evidence Theory: tostart, define the identify framework and generate basic probability assignment; then, obtainglobal basic probability assignment by integrating evidence conflict resolution, importancedistribution, and D-S combination rules; finally, we quantify word semantic similarity.(6) WordSemantic Similarity Measurement based on MYCIN Inference Model: define rules and generatecertainty factor of rules; then, obtain integrated certainty factor by evidence combination rules;finally, we quantify word semantic similarity.(7) Word Semantic Similarity Measurement basedon Cloud Model: to start, provide the definition of similar clouds; then, generate similar cloudsby backward cloud generator algorithm and piecewise linear interpolation technique; after that,represent common information and different information with digital features of similar clouds;at last, we quantify word semantic similarity.(8) Word Semantic Similarity Measurement basedon Fuzzy Sets: to start, give the definitions of similarity based on different domains; and then,obtain the membership functions with mean functions; finally, we quantify word semanticsimilarity.(9) Combine Cloud Model with Evidence Theory and MYCIN Inference Model forword semantic similarity measurement.(10) Use Cloud Model and MYCIN Inference Modelto quantify word semantic similarity based on the feature fuzzy processing.On benchmark data set R&G(65), the sample pearson correlation between our testingresults and human judgments is higher than0.91, with at least0.4%improvements over theexisting best practice,7%~13%improvements over classical methods; Spearman correlationbetween our methods and human judgments is higher than0.86, with9%~19%improvementsover the classical methods. On data set M&C(30) and WordSim353, the experimental results ofour methods are also good. And the computational complexity of our methods is as efficient asclassical methods. The above results indicate that applying uncertainty theory to measure wordsemantic similarity is reasonable and effective, and of which the method of using cloud modelis the best choice.The innovation of this paper:(1) unlike the current methods that are put forward bydepending on the expert experience, the methods in this paper model human judgement on wordsemantic similarity by word pair distance and depth respectively with manual annotation dataset by Cloud Model and Fuzzy Set, after that, synthesis evidences to quantify word semanticsimilarity.(2) In this paper, uncertainty theory is fully applied to word semantic similarity, andwe put forward the effectiveness analysis of the uncertainty theory used to word semanticsimilarity respectively. SIM-NB, SIM-SB, SIM-DS, SIM-FS, SIM-CL, SIM-CF, SIM-DS(CL),SIM-CF(CL), SIM-CL(FFS) and SIM-CF(CL-FFS) can fuse word pair distance and depth. Thericher the evidence mining, the more complete the training set, and then the more perfect thedata dictionary, consequently the more close to the human word semantic similarity decisionprocess.(3) Analyse the effectiveness of feature fuzzy processing for word semantic similaritymeasurement, try to study the multilayer imprecision of the process of word semantic similaritymeasurement by human being, including similarity uncertainty, similarity judgement uncertainty with single evidence, and feature fuzziness.The theoretical value of this paper is proposing an innovative method to solve wordsemantic similarity measurement based on uncentainty theory compared with the existingmethods. The practical value is that word sense disambiguation, ontology mapping andontology matching are expected to be enchanced, for our methods are of low time complexitybut effective.Future outlook:(1) Obtain membership functions of feature fuzzy sets through the training.(2) Consider how to better model human judgement on word semantic similarity by word pairdistance and depth respectively.(3) Explore the features which can be used on word sensesimilarity computation.

Keywords/Search Tags:

Word Semantic Similarity, Naive Bayes Model, Subjective Bayes Method, Cloud Model, Certainty Factor, Fuzzy Set, Evidence Theory, Feature Fuzzifier

PDF Full Text Request

Related items

1	Analysis Of Chinese Paragraphs Emotion Based On Naive Bayes
2	Research And Application On Naive Bayes Classification Algorithm
3	Research Of Intrusion Dynamic Forensics Model Based On Classification Analysis
4	The Mobile Customers Occupational Recognition Naive Bayes Algorithm-based Integration And Debugging
5	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
6	Naive Bayes and similarity based methods for identifying computer users using keystroke patterns
7	A Human Action Recognition Method Based On Computer Vision
8	Research On Text Classification Algorithm Based On Naive Bayes Method
9	Research And Application Of Naive Bayesian Classification Model
10	Research On Improving Naive Bayes Classification Model