Font Size: a A A

The Implementation Of The Chinese Information Extraction System Based On GATE

Posted on:2007-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2178360182478313Subject:Library science
Abstract/Summary:PDF Full Text Request
Information resources are becoming more digitalized, more complex and networklized.Facing such a situation, current digital libraries can't find the knowledge relationshipamong the information. Information extraction (IE) technologies make it possible totransfer the unstructured data into structured information. Based on researches onmany current IE systems, we find the current IE systems are mainly for English. So,after experiments and comparisons, we find a Chinese IE solution: We make a ChineseIE plugin based on GATE framework, in which we take use of ICTCLAS.There are 3 key difficulties in Chinese IE, which are Chinese tokenizing problem,professional gazetteers and Chinese named entity recognition. ICTCLAS is used tosolve the Chinese tokenizing problem. The author collects over 100M of Chinesegazetteers and makes over one hundred of JAPE rules to make the Chinese namedentity recognition more precise.Then we carry out an experiment in which the Chinese IE system successfully extractshundreds of pieces of technology news collected by RSS science technologyintegration system. This shows the system is available for practical use.At last, we come to a conclusion, the Chinese information extraction system based onGATE is a meanful trial that solves the Chinese information extraction, Englishinformation extraction, mixed Chinese and English information extraction and it is agood foundation for the future research work.
Keywords/Search Tags:Knowledge technology, Chinese information extraction, English information extraction, mixed Chinese and English information extraction, GATE, ICTCLAS
PDF Full Text Request
Related items