Knowledge Acquisition From Text

Posted on:2009-01-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J H Wang

Full Text:PDF

GTID:1118360245470119

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Text is one of the most important media for people to describe the world, express their thoughts and diffuse knowledge. Coming with knowledge economy, more and more attention has been paid on text knowledge management by researchers and engineers. But there are still some problems for text knowledge management systems: How to acquire the subject of the texts? How to extract the topic words of the texts? How to high-light personalized important information for different people? How to provide exact information for users? Keyword extraction and information extraction may help to solve these problems, which are important technologies in text processing. This paper focused on keyword extraction from single document and rule generation for information extraction. And main achievements are as following:1) Word sense disambiguation based on semantic networks and UW-PageRankThis paper proposes a word sense disambiguation method based on semantic networks and UW-PageRank, which is able to disambiguate all the words in whole text at one time without corpus and training.For Chinese, we use HowNet as knowledge base and build undirected weighted graph which use sememes as vertices and relatedness of sememes as weighted edges. Then UW-PageRank is applied on the graph to score the importance of sememes. Score of each definition of one word can be computed from the score of sememes it contains. Then, the highest scored definition is assigned to the word. This algorithm is tested with text indexing experiment and SENSEVAL-3.For English, we use WordNet as knowledge base and build undirected weighted graph which use synsets as vertices and relatedness of synsets as weighted edges. Then UW-PageRank is applied to score the importance of synsets. The highest scored synset is assigned to the word. This algorithm is tested with SemCor corpus.2) Keyword extraction based on semantic networks and UW-PageRankThis paper proposes a keyword extraction method based on semantic networks and UW-PageRank. After word sense disambiguation, one sense is assigned to one word, so the semantic graph can be pruned according to the results with only "right" sense. Then, UW-PageRank is applied to mining the most important senses, i.e. keywords.We test our algorithm on manually tagged Chinese and English papers and comparing with Tf algorithm, our algorithm performs better.3) Heuristic rule generation algorithm for Chinese information extraction: RGA-CIEThis paper proposes a heuristic rule generation algorithm for Chinese information extraction: RGA-CIE, which is domain independent for free text of Chinese. RGA-CIE applies supervised learning with bottom-up strategy, which is a rule generalization processwith a heuristic method to decide rule generalization path and Laplacian~* formula toevaluate the performance of rules. And semantic extension is also applied to improve the flexibility of rules. The learned rules have been tested on Commercial News Information Extraction System, and achieve a performance of 0.84 as precision and 0.82 as recall, which is better than the manually wrote rules. We also applied information extraction technology on ontology instance learning and made great contribute to Traveling in Beijing System.

Keywords/Search Tags:

Keyword Extraction, Information Extraction, Word Sense Disambiguation, WordNet, HowNet, PageRank

PDF Full Text Request

Related items

1	Research On Word Sense Disambiguation And Keyword Expansion In Question Answering System
2	An Approach For Word Sense Disambiguation Based On WordNet
3	Word Sense Disambiguation Technology Research Based On Hownet And Bayesian Model
4	A Study Of Chinese Word Sense Disambiguation Based On Hownet
5	Research Of Word Sense Disambiguation Based On Hybird Features And Rules
6	Automatic Knowledge Acquisition For Word Sense Disambiguation
7	Word Sense Disambiguation Based On Semantic And Lexical Information
8	Research On A Chinese Word Sense Disambiguation
9	Research On Chinese Word Sense Disambiguation Method Based On Graph Model
10	Research Of Chinese Word Sense Disambiguation Based On Hownet