Font Size: a A A

Research On Chinese Time Expression Recognition Technology Based On Rule Extraction

Posted on:2019-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2428330545976727Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Time expression recognition is an important part of named entity recognition in natural language processing.Recognizing time expressions and acquiring and using time information play a very important role in many fields,such as information retrieval,question answering,etc.This thesis focuses on a concrete problem,Chinese time expression recognition.First,time related basic concepts and basic methods to recognizing time expressions are introduced.Then,features of these methods are expounded in detail and a new method to recognizing time expressions is proposed.The main work and contributions of this thesis are as follows1.A method based on rule extraction to recognize time expressions and criteria to evaluate rules are proposed.Automatically extracting rules that can recognize time expressions by using word and POS features of time expressions in training data can avoid many manual work in process of constructing rules.Then,criteria is used to evaluate and filter extracted rules and the processed rule set is used to recognize time expressions.2.A new method that recognize time expressions based on rule extraction and use classification algorithms in machine learning to filter expressions is proposed.Methods based on rule extraction can achieve relatively high recall rate with enough rules,but they cannot achieve high precision rate because context and semantic information are not used sufficiently by rules themselves.With classification algorithms added in,training classifiers using training data and using them to filter candidate expressions can raise precision rate while recall rate will keep relative high.Using new method based on rule extraction on Microsoft Research Asia corpus for Chinese named entity recognition to recognize time expressions can achieve relatively good result that the F1 score can reach 94.05%,which is better than the method based on CRF and the method based on rules.
Keywords/Search Tags:Time Expression Recognition, Rule Extraction, Rule Evaluation, Named Entity Recognition, Classification Algorithm
PDF Full Text Request
Related items