As we know,there are plenty of time expressions in texts.Recognizing time expressions from texts and utilizing them are helpful to a lot of natural language processing tasks,such as question answering,reading comprehension and so on.Time ML is an annotation specification for time and event in texts.On the time aspect,it defines the extent and normalized value of each time expression,which makes us understand time expressions more scientifically.In this thesis,we follow the annotation of Time ML to explore the recognition and normalization of time expressions in texts using a combination of manual and automatic methods.The main work and contributions are as follows:1.For the time expression recognition,we model it as a pattern matching problem and propose a pattern-based method named TR.This method first constructs the token type system manually,then abstracts patterns of time expressions by token types,and finally matches the possible time expressions with generated patterns.Because of the characteristics of automatic pattern generation,TR needs less manpower than classic rule-based methods with a good interpretability.In the evaluation,TR has achieved a good recall,but the precision is not ideal.2.We propose another method named TR* based on TR.TR* adds a selection step after pattern generation,and this step retains good patterns by dropping some poor patterns.We model the pattern selection problem as an EBMC problem and solve it with a greedy algorithm.TR* achieves satisfactory results in the evaluation.3.For the time expression normalization,we propose a rule-based method TN.This method artificially assigns normalization rules to tokens after designing time functions,then uses heuristic algorithm to combine the normalization rules into the requiredfunction form,and finally executes them in turn.TN does not need to design expression level rules,and it is more flexible and convenient.In the evaluation,TN has achieved good results. |