Font Size: a A A

Entity Relationship Extraction And Mining Based On The Massive Data

Posted on:2013-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:H B BiFull Text:PDF
GTID:2248330374483300Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction technology is a very important technology in information research area. With the development of Internet, how to extract the information that users interested in is an urgent problem and also is an important research direction of information mining. Differ from information retrieval, information extraction need to recognize the named entity in the text, and to extract the relationship between the named entities. In addition, with the flexibility and the complexity of Chinese characters, the recognition of Chinese named entity and the relationship of the Chinese named entity become more and more difficult.Nowadays, there are two main methods of information extraction:one way is based on the rules, and the other way is machine learning algorithm based on the statistics. The accuracy of the first way is higher, but the determination of rules is very difficult, which request the writers to have higher level, also the portability of the algorithm isn’t so good. The second method adopts different models, and use the artificial marked training set to train the classifier, in order to deal with the new data group through computing the probability and to get the final results. Because of the higher portability, better performance and less cost, this method becomes the hot spot of the current research.With the increase of information network, the information extraction of massive data becomes more and more complex. How to use the massive data to extract the key information is a study problem of our paper. Also the calculation of the massive data is a challenging work. The main contributions of our thesis can be covered as follows:●In the named entity recognition process, we adopt the algorithm based on the maximum entropy model, and use the GIS algorithm to compute the parameters. ●We propose an algorithm based on semantic and SVM, which add the semantic characters into the extraction of the entity relationships, using which to construct the character vectors in order to improve the accuracy of the algorithm.●Through the analysis of the massive data and the named entity and the named entity relationship recognized, we construct a entity relationship network, and also use the optimization algorithm to achieve the correct results. Based on the final results, we can mine the implicit relationship to get more extensive entity relationship, which is good to grasp the whole information of the mass data.●We also study the mass data processing platform----Hadoop, design the extraction and mining system of entity relationship of mass data, and check the correctness of the algorithm we proposed in this paper.The entity relationship extraction algorithm based on semantic and SVM we proposed can improve the accuracy of the extracted results and the promotion ability. Although the optimization algorithm of the entity relationship extraction can improve the extraction results of the entity relationship, there also exist the influence of the key words ambiguity, which is one of the main problems to solve in the future work.
Keywords/Search Tags:Information Extraction, Mass Data, Entity Relationship Network, Implicit Relationship Mining
PDF Full Text Request
Related items