Font Size: a A A

Research On Key Technology Of Free Text Oriented Fine-grained Relation Extraction

Posted on:2012-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:1228330368998849Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information Extraction is an important research direction in the field of information processing after information retrieval and machine translation. The purpose of IE is to extract appointed events or facts and fill them into a database for users to query it, and only when the relations between the entities are right, then the database can be correctly filled. Relation extraction has become one key technology that effect the performance of IE system and it has extensive application background. With the rapid development of Internet and the rapid growth in the amount of online information, and with the development and maturity of natural language processing and machine learning techniques, it has become possible to extract useful structured information from free text.At present, relation extraction has gotten many achievements, and it has more and more pacing into people’s daily lives, such as google’s Powerset semantic search engine and Lucene full-text search engine architecture of apache software foundation etc. But since they all use text’s shallow features and depend on the training text from few specific areas, so their performance is not satisfactory, and relation extraction still facing many difficulties.The paper’s research object is Entity-Artribute-Value triples(EAV), and with the theory of Hierarchical Network of Concepts, description logics and semi-supervised learning theory to research the key technology of semantic-level fine-grained relation extraction(the relation between Entity-Artribute, Entity-Value, Artribute-Artribute, Artribute-Value), and the main contributions of the paper are:1. ALCIQ(EAV)(3.5) is constructed to describe fine-grained relation Ontology. According to traditional knowledge management pattern, the information lacks uniform semantic description, so it is hard for users to realize relevant information resource semantic fusion. Ontology technology is an important means to resolve this difficulty. For the people and heterogeneous systems who want to exchange information or share information, the establishment of Ontology can help clear the divergences of concepts and terminology, reach a consensus on the understanding of the concepts of the field, and it is the semantic basis of the mutual understanding between machines or people and machine. Based on Ontology technology, the paper presents ALCIQ(EAV) which is used to EAV modeling, the paper also realized the formalization of EAV Ontology dependency, EAV role dependency, EAV external dependency and EAV integrity with ALCIQ(EAV) reasoning algorithm, and it effectively solve the definition of the fine-grained relation scope.2. Semantic association degree algorithm is presented based on HNC (4.3.4) When fine-grained relation is extracted, association degree calculation can find inherent link and implicit relationship between words, it can also associate isolated word with its relational word(similar word, contrary word, collocating word, concurring word etc.) and it is the expansion of semantic similarity degree and semantic correlation degree. Let the world be a universal connected organic whole with HNC, and suppose words are connected with each other, thus the words compose a undirected weighted graph, and the associated words are connected by edge, while the weight of the edge is the association degree of these two words, therefore, inherent link and implicit relationship between words can be obtained by searching the path between two words in the HNC.Words association can be realized by computing HNC symbols’middle-level expression with HNC’s association mechanism. The solving of word association degree computing and the expanding of semantic similarity degree and semantic correlation degree are the basic of extracting entity, attribute and attribute value. The experiment result shows the attribute and attribute value that extracted by semantic association degree can more objectively represent actual fine-grained semantic relation.3. The type-undefined fine-grained relation extraction algorithm is proposed based on semi-supervised learning (5.3). The type-undefined relation extraction is the key problem of fine-grained relation extraction. To resolve the limitation of type-defined relation application, the paper gives a type-undefined relation clustering algorithm based on semi-supervised learning, and the algorithm is composed of:one learning algorithm based on positive examples and unlabeled data, one relation pattern generalization algorithm and one relation pattern confidence computation algorithm, and the fine-grained relation extraction experiment is also carry out on Wikipedia, the result is acceptable even though the training data is relatively few.4. The fine-grained relation extraction application is showed—Chinese technical terms analysis (6.2). Chinese technical terms analysis is beneficial to determine the connotation and class of Chinese technical terms, define and judge new terms, and it can also contribute to hold the development focus and development direction of the field that the Chinese technical terms belongs. To validate the effect of fine-grained relation extraction, the extraction method presented in the paper is applied to Chinese technical terms analysis. Firstly, Chinese technical terms is modelinged with ALCIQ(EAV), and the boundary of the term is determinated, second, the association degree of "term-artribute-value" is computed, and the artribute of Chinese technical term and its value is extracted, finally, the type-undefined relation extraction algorithm is used to process Chinese technical term clustering based on semi-supervised learning.
Keywords/Search Tags:entity relation, fine-grained relation, information extraction, description logics, HNC, natural language processing, semi-supervised learning
PDF Full Text Request
Related items