Font Size: a A A

Concepts Acquisition From Specific Domain Corpora

Posted on:2010-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:J R YaoFull Text:PDF
GTID:2178360278966401Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
How to turn tens of thousands of online text information on web pages into knowledge has become very urgent task in the field of knowledge acquisition. Because concepts as well as their inter-conceptual relations and inter-attribute relations are primary parts of human knowledge, how to acquire and verify concepts is an important step in knowledge acquisition.The essence of concept acquisition is to acquire terms which denote concepts. Foreign language processing technologies are not suitable for the Chinese concept acquisition because of unique linguistical characteristics of the Chinese language. There are many difficulties in Chinese concept acquisition, so mainly using the rules, statistics, syntax, and other information identifies the concept. By analyzing the characteristics of the concept words, the system can automatically acquire the concept words in the way of statistics. At the same time, statistical information of the candidate concept strings in the context of the real corpus is used to achieve concept verification. Finally a unified framework of concept extraction and verification which makes use of rules, statistic, syntactic, and contextual information to identify and verify concepts is proposed and realized. Experiments are carried out and experimental results are analyzed in the system.In details, this paper conducts the following researches:(1) By analyzing the characteristics of the concept words, we design and realize a extraction method on the basis of components feature and statistical features of concept words. The basic idea is to do word segmentation and POS tagging of the texts, then to extract the compound words mainly based on the term statistical characteristics of methods, the word mutual information and context-dependent features, finally to get the compound words as candidates for the concept words.(2) Realizing a context pattern method based on a pattern learning model and obtaining the best effective patterns for the extraction and verification of the concept words is to reduce the cost of the manually constructing patterns.(3) Proposing and implementing a system of concepts extraction and verification. System makes use of the rules, context patterns and statistical methods to achieve good results finally.
Keywords/Search Tags:Concept, Concept Acquisition, Concept Verification, Context
PDF Full Text Request
Related items