Font Size: a A A

Upper And Lower Semantic Extraction Based On Hybrid Kernel Method

Posted on:2014-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2208330434970488Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic knowledgebase induction utilizing unstructured mass data like web text has been a heated topic in the domains including NLP and Machine Learning. This Task generally involves two phrases, named concept extraction and semantic relation extraction. The former one targets at extracting the named entities from text formed by single or multiple nouns while the latter one is to distinguish different semantic relations amongst them. Hypernym relation is one of the most critical relations, which contributes magnificently to semantic dictionary, IE etc.Relation extraction is usually inferred as classification model, which means to decide whether specific relation exists with pair of nouns in a specific domain. Traditional text patterns are extensively applied including N-Grams, term frequency so as to depict the context of the concept pair. Nevertheless the feature space is too simple to portray the long dependencies in text, which is very critical in many semantic tasks.Text kernels are designed to solve this issue due to the capability of mapping the original feature space to the Hilbert space with a much higher dimension. In the current literature of text kernels there are two kinds of them, the tree kernel and subsequence kernel, both bring a significant improve compared to the traditional text feature based methods. While following the analysis given by other researcher, both these methods suffer from several issues respectively. Subsequence kernel has a higher precision but lower recall rate and conversely tree kernel has a higher recall but lower precision, their performance decrease when the distance between the noun pair increase. When the distance increase, the fragment of text which will count in the calculation of subsequence kernel will increase, so the probability increase when two fragments with different semantics have the same words.To further optimize the performance of text kernel as well alleviate the impact of the distance of noun pair we propose a novel hybrid kernel, which is resilient to the distance between noun pair and demonstrates high precision in the relation extraction scenario. As the component s of this hybrid kernel we firstly design a sub-path kernel, which compared with former tree kernel, emphasizes more on the part played by the word when calculating the similarity of two parsing tree, then we propose a context contiguous subsequence kernel which performs better on capture the semantic similarity between two segments of text.Based the algorithm proposed above, we further push the work on taxonomy induction by designing a new framework of automatically extract concepts and relations from web text, which also integrate state-of-art technologies in NLP and Machine Learning algorithm. The resulting knowledge can be utilized to enrich existing semantic dictionary like WordNet., which serve as a semantic repository offering semantic knowledge for numbers of applications including IE, Q&A, intelligent speech etc.
Keywords/Search Tags:semantic dictionary, hypernym relation, relation extraction, hybrid kernel, distance learning, information extraction
PDF Full Text Request
Related items