Font Size: a A A

Special Information Identify Base On Short Text And Its Application In Data Mining Engine

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2308330491951713Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, more and more short text data generated by people daily communication, such as SMS, WeChat, Weibo, QQ, Every day people will produce a large number of short text messages. The text contains a lot of specific information and has its value, but often to be ignored,because it has too much data and the structure is not structured, or only pay attention to one or two kinds of information, such as only to discuss Chinese Name Recognition or regular expression of phone number. The lack of a comprehensive study covering most specific information, which is the focus of this research, this article not only studies the identification of names, but also researches Place Name Recognition and seven kinds of Account Type Entity Recognition, make various of entities recognition to integrate into a recognition module.The article bases on information extraction, primarily to research Chinese Name Recognition,Place Name Recognition and Account Type Entity Recognition. To identify Chinese names By Hidden Markov Model and Viterbi algorithm, to identify place Names by Finite-state machine and geographical dictionary, to identify account type of entity by rules of the rule database. Make the identification of these methods integrated into a functional module, used by Hadoop engine and coding by MapReduce methods, making it suitable for handling huge amounts of data, In favor of it can be applied directly related to the demand in the future and make better use of the specific information which mining from the short text.The article proves efficiency of the Chinese Name Recognition through experimental comparison method, gives the major code of Account Type Entity Recognition and develops a recognition module of specific information combined with the project, it also describes the function and the overall architecture of the system. Then this paper makes a detailed description for the main modules of the system and makes a display interface system finally. In conclusion, the research in this paper has a certain practicality and innovation.
Keywords/Search Tags:Chinese Word Segmentation, parallel computing of MapReduce, Hidden Markov Model, Viterbi Algorithm, Rule Base
PDF Full Text Request
Related items