Font Size: a A A

Research And Realization On The Key Technologies Of Chinese Information Extraction

Posted on:2009-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y G YangFull Text:PDF
GTID:2178360245969980Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of new medias, such as Internet, how to find the useful information rapidly and accurately from a tremendous amount of electronic documents has become a burning problem, it is in such a background that information extraction was born and developed.This thesis works on some key technologies of Chinese information extraction, designs and implements several test systems, and explores the applications of information extraction in information content security. The main contributions of this thesis include:1. A supervised learning algorithm with bottom-up strategy is proposed, it can not only generate rules automatically and accurately, but also can transplant across domainsd. Based on this algorithm, two test systems are designed and realized: corporation's personnel changes news information extraction test system in finance and economics domain and mobile game news ordering test system. Experimental results show that the algorithm is effective to the both systems. In addition, The combination of information extraction and the technology of movable termination is explored, the mobile game news ordering test system shows that the intelligent information service mode, "information+SMS", is feasible.2. Hidden Markov Model(HMM) is used to extract sports game news, which the third experimental system is based on. A rules-based method is also joined, it brings improvement of the performance of information extraction. The experimental results show that the combination of statistics-based and rules-based methods is sastisfactory. 3. Named entity recognition in sports game news is dressed. A rules-based method is used and perform well on recognition of game names, the results of matches, etc.4. Application of information extraction in information content security is explored, For filtering the Chinese SMS spam, an orientation judgement model combining rules-based method and statistics-based method is proposed. A Chinese SMS content monitoring test system is designed and realized, experiments show good results.The final part summarizes the work in the thesis, and discusses the prospects and the future directions on Chinese information extraction.
Keywords/Search Tags:information extraction, machine learning, named entity recognition, hidden markov model(HMM)
PDF Full Text Request
Related items