Concept And Attribute Knowledge Extraction And Its Application

Posted on:2014-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:X L Wei

Full Text:PDF

GTID:2248330395498640

Subject:Computer application technology

Abstract/Summary:

Information extraction is extracting unstructured or semi-structured information from the text and making them structured. Today Information is growing rapidly. Information extraction can help people find the information they need faster. Attribute extraction is a kind of information extraction. Attribute extraction is extracting attributes of the same thing from more than one source of information. Most of attribute extraction methods only extract attributes from the World Wide Web or corpus, they do not make good use of some of the sources of knowledge. This paper presents a new attribute extraction method:it extracts attribute knowledge from hownet, and then extracts attribute knowledge from World Wide Web. First of all, get concept attribute library and attribute values library from hownet, and then extend these two libraries using the World Wide Web corpus. Finally, a more perfect attribute knowledge base can be created.Then, we can use these attribute knowledge base for word sense disambiguation. Word sense disambiguation is a technology of judging polysemous word in the specific context specific semantic, it is of great significance to many problems in the field of natural language processing. Be different from only using machine learning classification algorithms for word sense disambiguation, this paper proposes a new model of word sense disambiguation. The basic idea is to combine machine learning classification algorithm and attribute knowledge to improve the accuracy of disambiguation. The specific approach is establishing an attribute knowledge base for disambiguation words, because the same name words of different meaning have different attribute values, so the values of these attributes can be a context characterized of the polysemous word, and then use naive bayes or maximum entropy model to distinguish the multi-meaning words. The experimental results show that this method can effectively improve the accuracy of word sense disambiguation.The main innovation of this paper is as follows:1. This paper proposes a new attribute extraction method, extract attribute from hownet, and then extend it using the World Wide Web corpus. 2. This paper proposes a new method of word sense disambiguation, using attribute knowledge for word sense disambiguation.The experimental results show that the attribute knowledge base established by attribute extraction method proposed in this paper has higher accuracy rate and the method of combining machine learning classification algorithm and attribute knowledge can effectively improve the accuracy of word sense disambiguation.

Keywords/Search Tags:

attribute extraction, hownet, information extraction, word sensedisambiguation

Related items

1	Research Of Chinese Word Sense Disambiguation Based On Hownet
2	Knowledge Acquisition From Text
3	Study Of Chinese Event Information Extraction Based On Hownet Semantic Relation
4	Chinese Information Extraction And The Method Of Summarization Generating Based On HowNet Semantic
5	Visual Web Page Information Extraction And Text Feature Word Extraction Technology Research
6	Research On Opinion Mining For Customer Service Conversation Text
7	Research On The Extraction Of Entity Relationships From Multivariate Information
8	Research On Large-Scale Chinese People Information Extraction Based On Web
9	Research On Domain Entity Attribute And Event Extraction Technology
10	Research On Product Attribute Extraction From Semi-structured Web Pages