Font Size: a A A

Natural Language Processing, Words Related To Knowledge No Guide For Build And Balanced Classifier

Posted on:2002-10-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:S LuFull Text:PDF
GTID:1118360185995629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There is a well-known saying in the field of Expert System, 'Expert's knowledge is the key of Expert System.' That is right. We do think that if we want to make a great progress in Computational Linguistic, one of the key is to formalize the knowledge of natural language completely, consistently and minutely.That objective fact is mirrored and proofed by the failure of traditional manual methods and the success of up-and-coming methods from statistics, Pattern Recognition to Machine Learning.In the view of above, this dissertation studied deeply two problems, unsupervised acquisition of relevant knowledge between words and construction of balanced classifier by Machine Learning. Although there are differences between them, they belong to the same field, i.e. knowledge acquisition of natural language, only with different emphasis.[1] unsupervised acquisition of relevant knowledge between words is to construct a basic knowledge base with the features of general, minute and 'quantity' of distance.[2] construction of balanced classifier by Machine Learning is to propose a method to work for disambiguation in Natural Language Processing. It based on the hypothesis that there are both law and exception in the knowledge of natural language, so representation , acquisition and reason of nature language knowledge should be handled with different way.For unsupervised acquisition of relevant knowledge between words, this thesis resolved the follow problems based on the Vector Space Model from the field of Information Retrieval:(a) First, to calculate the information of every context position using Information Gain in Information Theory; then to fit the function between information and context position; finally to integrate the function in mathematics to set the boundary of effective boundary.(b) Based on the concept X-Matrix, proposed an improved approach tf.idf.IG to weight context term effectively.(c) For the goal of unsupervised acquisition of relevant knowledge between words, this thesis defined three kinds of 'noise' and proposed a serious of very effective methods to eliminate the noise.(d) 2-dimension visualization of words in high-dimension vector space played two important roles, one is to evaluate the result of noise elimination , another is to set the information ratio in Principle Component Analysis(PCA). At the same time, using the relevant...
Keywords/Search Tags:Natural Language Processsing, unsupervised acquisition of relevant relation knowledge between words, tf.idf.IG, X-matrix, definition and elimination of noise, weight of irrelevant context words, word sense disambiguation, 2-dimension visualization
PDF Full Text Request
Related items