Font Size: a A A

Open Information Extraction From Chinese Patents Using Markov Logic

Posted on:2015-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q M ZhaoFull Text:PDF
GTID:2298330467968632Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese patent, which is one of the most important components of the science andtechnology literatures, contains a great quantity of scientific research and technologicalinnovation knowledge described in natural language. In general, it is very difficult to calculateand even understand the unstructured knowledge for the computer. So the research oninformation extraction that transforms semi-structured or unstructured free-text to structureddata indicates the promising significance.In last two decades, people paid particular attention to developing information extractiontechnology. However, traditional information extraction has focused on satisfying precise,narrow and pre-specified relation, which leads to poor scalability, such as extensive humaninvolvement, high dependence on special domain and the complexity of matching patterns.That is why the research on information extraction is being shifted into open informationextraction from small homogeneous and target relations to open domains and relations.In recent years, in contrast with the significant achievements concerning English andother western languages, research on Chinese open information extraction is quite scarce. Sothis thesis presents two researches on Chinese patent documents.Firstly, a new approach is proposed, which is oriented to bilingual patent abstract, torecognize the MNP of Chinese patent text. We make use of three types of information (wordinformation in sentences, transferred information from TreeBanks and bilingual information),that is based on the joint framework of MLN, to recognize the bounds of MNP. Theexperiment results show that bilingual information has great positive effect on identificationof verbs, and the F-score of MNP evaluation reaches83.27%. The performance is greatlyenhanced, compared to the golden Berkeley Parser’s60.09%. What’s more, the new approachis simple and easy to expand.Secondly, the hierarchical Chinese open entity relation extraction approach is proposedthat applies Markov Logic Networks on the base of both external and internal chunk-tags.And the corpus for the MLN model is obtained by employing the self-learning method semi-automatically. The experiment results reveal that the start from chunks can simplify theunderstanding of sentences, and both layers can be handled consistently so that engineeringefforts are reduced. And on the same conditions, MLN can perform better than SVM, inwhich the F-score of external and internal layers can reach77.92%and69.20%respectively.
Keywords/Search Tags:Open Information Extraction, Markov Logic, Transfer Learning, SVM
PDF Full Text Request
Related items