Font Size: a A A

Research On Object Entity Similarity Based On Deep Learning

Posted on:2019-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y G LiuFull Text:PDF
GTID:2428330572460746Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning based Natural Language Processing technology is increasingly applied to text information mining.Object entity similarity discriminant provides basic data support for text mining,and is the key link of text data mining.Named entity is the basic data scattered in text.Multiple naming entities constitute object entities.Object entities can provide more effective information for related organizations,and expand the dimension of objects in text mining.The similarity of text object entity is a very basic and key problem.It has been a hot spot and difficult point for a long time.The following work has been done:First,we analyze the research background,significance and research status,and introduce the theory and method of text object entity similarity research,including two deep learning methods,word segmentation,text annotation and development tools.Secondly,this paper uses the Viterbi algorithm of hidden Markov model for named entity recognition of Chinese text,named entity recognition of the basic data for further research;construct the object template template,considering the context of named entities in the text of the situation,adding a constraint named entity between subordinate and distance,and by using logistic regression algorithm to train the constraint threshold,thus forming a structured data object.Finally,according to a plurality of similar or identical objects information in a text,to classify the data to determine whether similar,this paper adopt supervised algorithm to achieve the classification,using two models: a deep learning algorithm based on BP and simhash neural network method of similarity,similarity to calculate the two data between each field using simhash algorithm,and using BP neural network supervised training algorithm,the accuracy rate reaches 80%;two,the use of the length of memory for recurrent neural network LSTM instead of the traditional distance method to calculate the similarity of the text,the accuracy rate reached 85%;through the experiment LSTM text similarity method is more effective in the correct classification rate of the text,more suitable for judging the structured data of field information is similar.
Keywords/Search Tags:object entity, hidden markov-viterbi model, simHash, LSTM
PDF Full Text Request
Related items