Font Size: a A A

Research On Chinese Automatic Semantic Role Labeling Method Based On Bi-LSTM

Posted on:2020-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhuFull Text:PDF
GTID:2428330596978122Subject:Internet of Things works
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and the explosive growth of data volume in the era of big data,it is increasingly difficult for people to obtain and process information accurately,quickly and comprehensively,especially the information in the form of text.At present,there have been a lot of research results on Chinese automatic semantic role labeling,so there are still many challenging problems to be solved.After deeply exploring the existing semantic role labeling model,this thesis mainly studies data preprocessing,feature vector and sequence labeling algorithm.The main work of this thesis is as follows:1.The problem of unbalanced training samples of sparse predicates and common predicates was studied in depth,and the concept of semantic density clustering was proposed.In order to improve the ability of multi-feature representation of the input vectors,a "fuzzy" mechanism is proposed,which uses the concept of word vector distance to "fuzzy" the non-predicate word vector,and changes the semantic expression characteristics of the original word vector.Taking Chinese Proposition Bank(CPB)as experimental material,multi-dimensional and multi-angle comparison experiments were conducted on the automatic semantic role labeling model based on Bi-LSTM-CRF framework,and the results show that this method could better achieve semantic role labeling performance.2.In view of the fact that auxiliary features have great influence on the result of semantic role labeling,a Bi-LSTM network layer is constructed and trained to obtain the expression of the part-of-speech feature.The trained expression of part of speech features constitutes a vector,which is a part of the input vector of the model.In combination with the word vector and the domain dictionary,six effective statistical features are introduced,the CRF model is used to realize domain term recognition,the one-hot representation of domain terms is initialized with weights,and the new input feature vector is formed by combining the word vector and part of speech vector in a specific way,and a multi-dimensional and multi-angle comparison experiment is conducted on the automatic semantic role labeling model based on the Bi-LSTM-CRF framework.The experimental results show that the introduced auxiliary features can effectively represent the text,and the proposed text representation model has better semantic information expression and domain adaptability.3.Aiming at the obvious defects of the "neural network +CRF" framework and the base classifier used in sequence labeling tasks,a sequence labeling algorithm integrating multi-category classifiers is proposed.Conditional random field,structured support vector machine and the maximum interval markov network is used for effective integration.First,the integrated learning direction is used to train three types of base classifiers,and ten weak learners are obtained respectively.Then,the weak learners were integrated into three types of strong learners using the arithmetic mean combination strategy.In the prediction stage,the state transition matrix is introduced,and finally,the Viterbi algorithm is used to solve the prediction sequence.In the experimental stage,the model is applied to Chinese word segmentation,part-of-speech tagging and semantic role tagging tasks.The experimental results show that the proposed model has good performance in sequence tagging tasks.
Keywords/Search Tags:semantic role, Sparse predicate, Fuzzy mechanism, Part of speech vector, Domain term, Sequence annotation
PDF Full Text Request
Related items