Font Size: a A A

Research On Fast Exact Structured Learning

Posted on:2011-05-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X QianFull Text:PDF
GTID:1118360305997279Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Structured learning models owe a great part of their success to the ability in using structured information. However, these methods are more time consuming than non-structured learning models. Though approximate algorithms reduce the computational complexity, they degrade the accuracy to some extent. Therefore, exploring fast extract algorithms has important role in structured learning.In this paper, we improve two aspects of structured learning:inference and feature extraction. First, for sequence labeling tasks, we proposed sparse higher order Conditional Random Fields(SHO-CRFs) based on the characteristics of sparseness of higher order features in many real applications, together with a novel extract tractable inference algorithm which is able to deal with local and sparse higher order features. SHO-CRFs are practically very efficient due to fea-ture sparseness. In optical character recognition task, we use word affixes as higher order features, SHO-CRFs achieve the highest reported accuracy. In Chi-nese organization name recognition task, we achieve the second highest F1 score on Microsoft Research Asia corpus. Both experimental results show that, with the same feature set, SHO-CRFs significantly outperform other state of the art methods.Second, we propose a novel feature string indexing structure for fast feature extraction to reduce decoding time. Many modern structured learning methods adopt templates to generate millions features. Complicated templates bring about abundant features which lead to higher accuracy but more feature extraction time. We proposed two-dimensional Trie(2D Trie), a novel data structure which takes advantages of relationship between templates for fast feature extraction:feature strings generated by a template are prefixes of the features from its extended templates. Experimental results on Chinese Tree Bank 6 corpus show that,2D Trie is about 5 times faster than traditional Trie, leading parsing time 4.3 times faster.
Keywords/Search Tags:Structured learning, Conditional Random Fields, Sparse Higher Order Conditional Random Fields, 2D Trie, sequence labeling, dependency parsing
PDF Full Text Request
Related items