Font Size: a A A

Research On Word Sense Disambiguation Based On Dependency And Domain Knowledge

Posted on:2015-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:W P LuFull Text:PDF
GTID:1228330422993439Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are many ambiguous words in natural language, which can be interpreted withmultiple meanings depending on their contexts. Word sense disambiguation (WSD) is toidentify the meaning of a word based on its context. WSD belongs to basic research fieldsin natural language processing, whose effectiveness directly affects machine translation,information retrieval, information extraction, etc. WSD is regarded as an AI-completeproblem. So far, it is always one of the most complex problems troubling computationallinguistics scholars.The key issues of WSD are to acquire enough disambiguation knowledge and to buildsuitable disambiguation model, in the final analysis, which are the problems of knowledgeacquisition and utilization. Existing WSD works don’t pay enough attention to mining andutilization of dependency knowledge and domain knowledge. Focusing on mining andutilization of dependency and domain knowledge, this dissertation tries to improve theeffectiveness of WSD. Aiming at the problems of feature words selection, knowledgeacquisition, utilization of domain knowledge, this dissertation proposes three kinds of WSDmethods based on dependency knowledge and domain knowledge; then, this dissertationproposes a multi-classifier combination method to combine the three methods, which canfurther improve the effectiveness. The main works and contributions of the dissertation arelisted as follows:The traditional WSD with similarity measure exists some deficiencies, because itcan’t accurately select feature words of ambiguous words. In order to solve the problem,WSD based on dependency tree with similarity measure is proposed in the dissertation.There are two traditional methods to select feature words, which are context window anddirect dependency relation. The former is easy to mistakenly select unrelated noise wordsof short distance and miss related feature words of long distance; the latter often only getsfew feature words, even fails to get any valid feature words. The dissertation proposesWSD based on dependency tree with similarity measure, firstly, the sentence of ambiguousword is parsed with dependency grammar to build its dependency tree; according to theshortest paths between words on dependency tree, feature words of ambiguous word areselected and assigned appropriate weights; semantic similarities between feature words andeach of senses of ambiguous word are computed, which are weighted summed; lastly, the sense with max weighted sum of semantic similarity is chosen as right sense. Results ofexperiments show that the method can select feature words more accurately and canimprove the effectiveness of WSD, whose recall can reach39.52%on BNC child set ofKoeling dataset.WSD is confused by the problem of knowledge acquisition bottleneck. In order tosolve the problem, WSD based on dependency fitness with automatic knowledgeacquisition is proposed in the dissertation. The method achieves automatic knowledgeacquisition in WSD by taking full advantage of dependency parsing. Firstly, a large-scalecorpus is parsed to obtain dependency cells whose statistics information is used to build adependency knowledge base (DKB); then, the ambiguous sentence is parsed to obtain thedependency constraint set (DCS) of ambiguous words; for each sense of ambiguous word,sense representative words (SRW) are obtained through WordNet; finally, based on DKB,dependency fitness of all kinds of SRW on DCS is computed to judge the right sense. Themethod proposes a complete solution to mine and utilize dependency knowledge on WSD.Results of experiments show that compared with unsupervised and knowledge-basedmethods which don’t utilize any sense-annotated corpus, the proposed method yieldsstate-of-the-art performance, whose recall can reach74.53%on Task#7dataset in SemEval2007.Knowledge-based WSD exists the problem failing to fully utilize domainknowledge. In order to solve the problem, WSD with graph model based on domainknowledge is proposed in the dissertation. The method divides domain knowledge into textdomain knowledge and sense domain knowledge. Keywords related with target text domainare collected with log likelihood ratio as text domain knowledge, and domain annotationsof each sense of target ambiguous word are obtained with WordNet Domain as sensedomain knowledge. In order to utilize the domain knowledge in WSD, a disambiguationgraph is constructed with text domain knowledge and sentence context words; then, thedisambiguation graph is to be adjusted based on sense domain knowledge. In order to avoidthe deficiencies of existing graph evaluation methods, based on weighted edge andbidirectional path, multiple improved evaluation methods are proposed. The sense nodes inthe graph are scored with improved evaluation methods to judge right sense. Results ofexperiments show that the method can achieve the best performance among similarmethods on Koeling dataset.Traditional multi-classifier combination methods exist one-sidedness problem. In order to solve the problem, a combination method of probability weighted voting withdynamic self-adaption is proposed in the dissertation. Existing methods combine multipleclassifiers from points of overall performance of a classifier or individual difference ofdisambiguation sample. These one-sidedness combination methods attend to one thing andlose another, which are hard to achieve best performance. The dissertation introduces theparameter of overall performance of a classifier and the parameter of confidence of asample into probability weighted voting and sample dynamic self-adaption weighted voting,proposes the method of probability weighted voting with dynamic self-adaption.Experimental results show that the method can achieve better effectiveness than otherexisting multi-classifier combination methods, whose recall can reach83.08%on Task#7dataset in SemEval2007.
Keywords/Search Tags:Word Sense Disambiguation, Dependency Knowledge, Domain Knowledge, Dependency Tree, Dependency Fitness, Graph Model, Multi-Classifier Combination
PDF Full Text Request
Related items