Font Size: a A A

Technique Research Of Web Chinese Event Automatic Detection

Posted on:2011-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2178330332978677Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the high speed development of communication and internet technologies, internet public information collection based on event has become one of important researching areas in intelligent information processing. It is an exigent problem for researchers to solve how to detect and describe event, and collect interested information based on event in numerous web data quickly and exactly. This paper mainly discusses the technique of web Chinese event automatic detection, which involves automatic Chinese event annotation, time information extraction, automatic event extraction and web topic detection based on event. The major contributions of this paper are listed as follows:(1) A method for time information extraction based on user-defined rules is presented. Aiming at disadvantage of single target of traditional time extraction method, time expressions of text is classified exactly, and time range is defined. Then, different rules for time expressions are constituted, and user defined time information extraction is achieved. Experiment results show that the precision and recall of the new method are superior to those of traditional methods.(2) A self-similarity clustering event extraction method based on triggers guidance is proposed. Firstly, the idea of traditional event classifying method based on feature word is changed, and clustering idea is adopted to classify event catalog where K-means clustering algorithm is applied. Secondly, based on triggers guidance, min-max clustering strategy is adapted to self-constrict K in K-means clustering algorithm, which optimizes clustering algorithm, and event classification is completed. Thirdly, based on Named Entities and their location information in text, event arguments are described, dependency of event catalog model is solved, and Chinese event extraction is completed. Experiment results show that the new method outperforms traditional event extraction methods in precision and recall, and provides a new thought for Chinese event extraction.(3) A method of automatic topic detection is put forward based on document concept similarity in stead of feature word similarity on vector space model in traditional topic detection methods. Firstly, sample and topic set is analyzed, event arguments are extracted, and document vector space model is constructed. Secondly, concept similarity, words similarity and text similarity is calculated based on HowNet. Finally, topic detection is realized based on document concept similarity. Experiment results show that the new method is more efficient than traditional methods in precision and recall of topic detection.
Keywords/Search Tags:Event Extraction, Trigger, K-means Clustering Algorithm, Time Expression, Concept Similarity, Topic Detection
PDF Full Text Request
Related items