Font Size: a A A

Research On Semantic Representations Based On Dependency Relation

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2308330503986887Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the emergence of statistical natural language processing, in text mining, give everything a right meaning is the basic way of communicating with nature. We need to convert text into a numerical data structure, always a vector, to represent a word. Each dimension of the vector represents the weight of a semantic or grammatical feature of a word. Thus, the meaning of a word in a text context is expressed as much as possible. These vectors can not only express the correlation between words efficiently, At the same time it is the basis of other natural language processing projects. Such as word or sentence, text emotion analysis, text classification or clustering.The distributed semantic model has become a powerful method to represent the semantics of words, received popular recognized. However, the distributed semantic model implicitly makes the assumption that a word meaning is closely related to the other words, and ignores the position relationship of words. For the distributed semantic model ignores the semantic relationship among words, we propose an extended model to the distributed semantic model, merge the structure in distributed semantics. The meaning of words is represented by the distribution of words in the syntactic relations. We believe this representation can effectively capture the semantics of words, and can well express the composition of the semantic. At the same time, We have a clever mix of the distributed semantic model and the structural semantic model. Not only can semantic relations between words and expressions and also can solve the problem of sparse data set because of the training data set, improve training efficiency.In the experiment, we chose the English data from Wikipedia as our data source, According to the official interface, downloading the compressed XML file, about 11 G or so. After preprocessing, choosing 40 million complete sentences as training set, testing our result on four related evaluation task. Using data sets in WS- 353 to test word similarity score, measured by Spearman’s rank correlation coefficient, obtained 0.6548; Taking synonym selection of TOEFL as the test sets for the evaluation of synonyms, the accuracy is 0.853; And obtain 0.5004 Spearman coefficient on the task of short sentence similarity; On the classification of word pair relation, the result is 0.492.We show that the proposed semantic representations model based on dependency relation can effectively express the meaning and can be applied to more platforms through a new semantic combination method.
Keywords/Search Tags:words representation, distributed semantic model, structural semantic model, semantic combination, dependency relation
PDF Full Text Request
Related items