Font Size: a A A

Concept And Attribute Knowledge Extraction And Its Application

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:X L WeiFull Text:PDF
GTID:2248330395498640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction is extracting unstructured or semi-structured information from the text and making them structured. Today Information is growing rapidly. Information extraction can help people find the information they need faster. Attribute extraction is a kind of information extraction. Attribute extraction is extracting attributes of the same thing from more than one source of information. Most of attribute extraction methods only extract attributes from the World Wide Web or corpus, they do not make good use of some of the sources of knowledge. This paper presents a new attribute extraction method:it extracts attribute knowledge from hownet, and then extracts attribute knowledge from World Wide Web. First of all, get concept attribute library and attribute values library from hownet, and then extend these two libraries using the World Wide Web corpus. Finally, a more perfect attribute knowledge base can be created.Then, we can use these attribute knowledge base for word sense disambiguation. Word sense disambiguation is a technology of judging polysemous word in the specific context specific semantic, it is of great significance to many problems in the field of natural language processing. Be different from only using machine learning classification algorithms for word sense disambiguation, this paper proposes a new model of word sense disambiguation. The basic idea is to combine machine learning classification algorithm and attribute knowledge to improve the accuracy of disambiguation. The specific approach is establishing an attribute knowledge base for disambiguation words, because the same name words of different meaning have different attribute values, so the values of these attributes can be a context characterized of the polysemous word, and then use naive bayes or maximum entropy model to distinguish the multi-meaning words. The experimental results show that this method can effectively improve the accuracy of word sense disambiguation.The main innovation of this paper is as follows:1. This paper proposes a new attribute extraction method, extract attribute from hownet, and then extend it using the World Wide Web corpus. 2. This paper proposes a new method of word sense disambiguation, using attribute knowledge for word sense disambiguation.The experimental results show that the attribute knowledge base established by attribute extraction method proposed in this paper has higher accuracy rate and the method of combining machine learning classification algorithm and attribute knowledge can effectively improve the accuracy of word sense disambiguation.
Keywords/Search Tags:attribute extraction, hownet, information extraction, word sensedisambiguation
PDF Full Text Request
Related items