Font Size: a A A

Research And Implementation On Automatically Generate Chinese Event IE Pattern

Posted on:2011-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiFull Text:PDF
GTID:2178360305476425Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the amount of information increases in an explosive way. How to quickly extract what users are really interested in from a vast of information promotes the development of information extraction. Pattern-matching method is commonly used in information extraction, in order to reduce the intervention of users in the process of automatically obtaining the extraction patterns. The paper designs and implements an automatically generate Chinese event information extracting pattern system based on sentence clustering, it contains three modules: HTML keywords extraction, clustering and pattern automatically generation.In the stage of key word extracting, it focus on how to get the key words of Web Pages and propose an improved TFIDF method based on the structure of Chinese texts and the part-of-speech of Chinese words. The experimental results show that our method can significantly improve the performance than that of the classical method.In the stage of sentence clustering, an improved CURE algorithm is proposed in this paper. By analyzing the feature of traditional CURE algorithm, and referencing the features of the events. It improves the selecting of representative points and categories combined mechanism. It solves the problem that isolated points are usually regarded as representative points. In addition, it considers the general features in the process of categories combination, and make cluster combination more reasonable.In the end, we extract information extracting pattern from the sentence cluster. Including three processes: Pattern definition,special pattern generating and pattern generalization. Firstly, it statistics the clustered sentences, we can forecast the objects and main contents described in the event and define the extracting pattern. Then special patterns are iteratively selected from the cluster; finally, we generalize pattern from the grammar and semantics.The experimental results show that the method proposed in this paper can reduce the requirements of users, and the effects and performance can achieve the designed objectives.
Keywords/Search Tags:keyword extracting, CURE clustering, event clustering, automatically generate pattern, special pattern
PDF Full Text Request
Related items