Font Size: a A A

Research On Part-of-Speech Tagging With Transformation-Based Learning

Posted on:2012-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178330332990721Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the increasing information technology of society, the Internet is increasingly becoming a part of daily life, people are increasingly strong desire to communicate with the computer by natural language. But there is a premise that computers would understand natural human language. This is a very challenging problem. This problem is natural language processing problem, and part of speech tagging is the most basic and important technology as the most low layer process in this field, playing a key role to the whole language processing.Part of speech tagging is widely used in many application areas, such as parsing, speech recognition, text classification, text-to-speech, information retrieval, machine translation and other areas. Meanwhile, with the rapid development of machine learning algorithms, a variety of machine learning methods have been applied to the field of speech tagging, HMM algorithm, maximum entropy algorithm, decision tree, rule-based algorithms and so on. The TBL algorithm is a kind of rule-based algorithm. Although the TBL algorithmhad been improved by many professors from 1995 and it is already mature relatively, the algorithm will take a lot of resources and performance to extract and estimate rules when it works. So the algorithm marked itself of own training relatively slow.This thesis adopt the previous research ideas, on the basic of original algorithm, we evaluate scores by skipping those rules that have a low evaluation score and didn't have a significant results to corpus annotation. Only the rules have a significant effect in application would be scored. First, find the best transformation rules which can change the sample in the corpus, then find the context of the sample in the corpus, influence its context by these samples, and ultimately achieving the purpose of POS tagging.After compared with other TBL algorithm and test in Penn Tranbank Wall Street Journal corpus, we found that this improved algorithm has made some improvements compared to other algorithm.in the case of time decreased, the accuracy has not lost.
Keywords/Search Tags:natural language process, part of speech, transformation-based learning, rule
PDF Full Text Request
Related items