To make computers understand human language,the word embedding method represents the semantics of each word with a low-dimensional dense vector,which is a breakthrough in the field of natural language processing.However,there is a polysemy phenomenon in natural language,but all senses of a word can only be represented as a single word vector.To solve this problem,the sense embedding that represents each sense of a word as a vector has been the subject of several studies in recent years.To construct the sense embedding,the sense inventory of a word is needed to be obtained,and then a vector representation of each sense is generated.The sense inventory defined in the existing models is not accurate enough,and the process of generating vector representations is too simple.To solve these problems,a sense embedding constructing model based on semantic graph clustering is proposed.This model builds the sense distribution of the target word as a graph and induces the senses of the word dynamically according to the improved graph clustering algorithms.Moreover,the model does not directly specify the generation function of the sense vector,but first sets the optimization target function of the sense vector,and then iteratively solves the mapping function of the sense vector about the sense cluster.Furthermore,to show the application of the sense vectors in downstream tasks,the constructed word sense vectors are integrated into the word sense disambiguation task and entity disambiguation task to improve the traditional schemes.Among them,the word sense disambiguation task is studied in the general field,to solve the shortcomings of the traditional schemes that require a large amount of labeling data,a new scheme combining the local credibility of calculated by the sense vectors and the global popularity is proposed.The entity disambiguation task is studied in the manufacturing field,to eliminate the ambiguity caused by the use of the word vectors to express semantics in the traditional schemes,a new scheme for training classifiers using the features such as the semantic similarity represented by the sense vectors is designed.The experimental part uses three datasets to evaluate the quality of the constructed sense vectors and the performance of the sense vectors in two disambiguation tasks,confirming that the performance of the sense embedding constructed by graph clustering is improved by about 3% to 4% compared with the state-of-the-art Glo Ve word embedding and CWMS sense embedding. |