Font Size: a A A

Recognition And Classification Of Chinese Network Derived Entity Based On Deep Learning

Posted on:2018-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y B XuFull Text:PDF
GTID:2348330515989727Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosion of the Internet information,there are more and more non-standard Chinese words,such as phonetic,abbreviated,synonyms and other non-standard Chinese expression.Due to the flexibility of the organization and use in Chinese,a large number of text subject words are expressed by these forms of derivatives,which are called network derived entities.Since the Chinese network derived entities are complex and difficult to identify,and are often used to replace the original words to evade the government's regulation in public opinion,these entities has brought many difficulties to public opinion monitoring and natural language processing.In recent years,domestic and foreign scholars have discussed and studied how to identify the derived entities of specific types in the text extensively,but have never studied the overall data distribution of the network derived entities so far.Moreover,a large number of new derived entities have been emerging,which presents new requirements to the entity's identification technology.The main work of this paper is as follows:1)The research methods and techniques of the mainstream identification model in recent years are analyzed and compared respectively,and the methods and techniques of the mainstream identification model are analyzed.Based on the analysis and summarization of each method,the advantages and disadvantages of them are pointed out in the application.Besides,combined with the characteristics of the study of this paper,new ideas that based on neural networks for Chinese network derived entity identification are proposed in this paper.2)Two neural network architectures for Chinese network derived entity recognition are proposed:sliding window method and sentence convolution method,which solve the problem that the lengths of sentences in text are not uniform and cannot be input into neural network.In this paper,word vectors trained by word2vec method are used as the basic component of the model input.Moreover,the artificial feature vectors encoded by the stacked auto encoder are used to compose the composite input to further improve the recognition accuracy of the model.In addition,by using the special activation function and fraining algorithm,the structure of the model and the training process of it are optimized.3)By using the constructed corpus,a large number of comparative experiments were carried out.Due to the lack of open corpus,this paper uses Scrapy framework to collect the corpus and gets the tags by artificial annotation,finally getting an annotated corpus which is 252.3MB.A large number of derived entity recognition tests are carried out based on this corpus.The experimental results show that these two models proposed in this paper can effectively deal with the problem of network derived entity recognition,and the performance value of F1 is 78.6%and 76.9%respectively.The results are superior to those of other traditional methods in the identification based on this corpus.In addition,by studying the influence of different parameters and different methods on the experimental results,general tuning experience is learnt,which can be provided as a reference for other researchers.The practice shows that the proposed recognition model based on deep learning in this paper can be applied to the recognition task of Chinese network derived entity.The model can achieve better recognition performance for all kinds of derivative entities at the same time,and can meet the new requirements of Chinese network derived entity recognition under the background of large data.
Keywords/Search Tags:network derived entity, deep learning, entity recognition, word embedding, auto encoder
PDF Full Text Request
Related items