Font Size: a A A

Research On Personal Entity Relation Extraction Based On Theme Microblog

Posted on:2019-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L DiaoFull Text:PDF
GTID:2428330548978459Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,more and more users are participating in the network.A variety of convenient and fast social networking platforms,such as Twitter,Face Book,Sina Weibo at home and aboard,has greatly changed the way people access to news and updated current affairs.Internet has now become a global repository of information,which stored mass data and valuable information covering all aspects of people's life.Hence,how to extract useful structured data from these unstructured text data is the focus of this study.However,as the short text feature of Weibo has difficulties in topic classification,the traditional relational extraction methods cannot be used in the Weibo Corpus normally.Therefore,this paper studies the character entity relationship to solve such problems as following points:CWTM(couple-word topic model)suitable for short text is proposed for Weibo corpora to topic classification.Traditional topic extraction research is aimed at English corpus of long text,not suitable for short text with cyber new words in Weibo.In response to this problem,the CWTM model based on DMM(Dirichlet Multinomial Mixture)is proposed as short text and various expressions.The model expands the semantic information of short texts by extracting word pairs and replaces the co-occurrence of traditional words.It can relieve the sparseness of texts to a certain extent and improve the mining effects of short texts.The convolutional neural network is used to character extraction in Weibo's topics.The importance of each word in every topic sentence promotes the keyword algorithm based on sentence.The top of word is selected to be the characterization of the key characteristics of subject categories,along with the original sentence words vector characteristic and position vector characteristics as the convolution of the neural network with the initial input,which avoid the existing relationship on the basis of the deep learning method by relying on nothing more than a single word in vector learning characteristics.In the parameters of the model training phase,in order to get better results of feature extraction,avoid the interference of the maximum pooling strategy which traditional neural networks are often used for feature characterization information.The combination of the characteristics of value scored highest in each section of the output—the piecewise biggest pooling strategy,is input characteristics of softmax.Finally,many experiments and analysis were performed on real data sets to verify the validity of the model and algorithm presented in this paper.Experimental results show that the CWTM model has lower perplexity,higher F-Measure value,and more accurate extraction results in the same experimental environment,compared with the traditional DMM topic extraction model.Three groups of comparison experiments only given in this paper,and thecontrast experiments show that the entity relationship extraction method based on convolutional neural network has good effect and practical significance in Chinese corpus.
Keywords/Search Tags:Chinese text, short text topic extraction, neural network, relation extraction, CWTM
PDF Full Text Request
Related items