Font Size: a A A

Research On Statistical Word-level Semantic Relatedness Computation

Posted on:2015-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Q SunFull Text:PDF
GTID:1228330422990659Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In many cases, there exists semantic relationship between natural language objects. Such relationship is beneficial to a variety of research problems, including natural lan-guage processing, information retrieval, machine translation, automatic question answer-ing, etc. Semantic relatedness computation is the quantification manner of the semantic relationship, and faces both theoretical and practical challenges from the broad appli-cations of the latter, including the semantic representation of the objects, the design of the relatedness computation model, the analysis of the information source and features’ quality, the application-specific semantic relatedness, cross-language commonality of the computation algorithm, etc.This dissertation first clarifies the definition of "semantic relatedness", formulates its computation in a universal mathematical form, and points out the three key aspects in research:the selection of the semantic links, the design of the feature mapping, and the design of the comparison mapping. With respect to these aspects, this dissertation con-ducts research on semantic relatedness computation using statistical methods, focusing on word-level linguistic objects, including words, named entities, and Web search queries. The research efforts of this dissertation fall in to four parts as follows.In word semantic relatedness calculation, we investigate the design of the calcula-tion function under heterogeneous evidences. The similarity of usage (i.e. context) and the connectivity of semantic relations are both important clues to word semantic relat-edness. To deal with these two different forms of evidences, we designed a semantic relation-enhanced distributional similarity algorithm that uses distributional similarity to quantify semantic relation connectivity, unifying the two types of evidences. Experiments showed that the fusion of semantic relation and distributional similarity effectively im-proves the correlation between the semantic relatedness calculation results and the human annotations, and that our evidence fusion method outperforms the method that simply aggregates heterogeneous features.In related named entity mining on normal text, we investigate how to represent the semantic relationship between linguistic objects under specific application scenarios. As proposed in this paper, the semantic relationship between named entities could not exist without the association between the corresponding real-world objects. We quantified the name entities’semantic interaction strength by examining their discourse co-occurrence. Augmented with the similarity of mention and the proximity in co-occurrence, the dis-course co-occurrence based related named entity mining yields better results than the relation extraction-based approach.In knowledge base-supported named entity semantic relatedness calculation, we in-vestigate the feature weighting strategy given a weak statistical characteristic of the lin-guistic objects. As the first attempt, we propose the entry entity semantic relatedness calculation based on subject-property-object records. Due to the designing principle of the subject-property-object records, typical effective statistical method for normal text are not applicable. To address this issue, we propose weighting named entities’semantic features using the user demand information in query log. Compared with the statistical method that exploits knowledge base’s internal statistics, our approach achieves better precision in the related entity recommendation problem. Additionally, we also analyzed the weighting strategy’s effectives on varied scales, quality and domains of the knowl-edge base entries, as well as how the calculation function should exploit these weighting results.In query semantic relatedness judgment, we investigate the analysis and refinement of the semantic relatedness computation strategy. Query semantic relatedness judgment is to determine whether two Web search queries are aiming at the same information demand. Different users have varied habits, resulting in varied effectiveness of the features used in the judgment model, and further requiring the adaptability in the model to deal with the users’individual variability. We first designed a range of classification features according to the two typical evidences of relatedness:time proximity and content similarity, and built effective judgment models. Afterwards, we analyzed the features’intrinsic discrim-ination power using ROC curve analysis that is independent from specific models. Based on the findings in the analysis, we further proposed individual judgment models that are specific to user, and improved the global model’s effectiveness by fusing the individual variations into it.
Keywords/Search Tags:semantic relatedness, word-level, context similarity, named entity, query se-mantics
PDF Full Text Request
Related items