Font Size: a A A

Computing And Evaluation Of Relatedness Between Short Text And Terms

Posted on:2015-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2298330422976229Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In natural language processing (NLP), the research oflexical relatedness is mainly concerned the relatednessbetween two lexical units, such as words, terms, etc., and themethod of calculation and evaluation are independent of thetext to be analyzed. Relatedness between a pair of lexical unitsis a topic that arouses much notice. Because the relatednessis heavily dependent on the semantic meaning of the textinvolved and human’s intuition, obtaining the precise scoreof the relatedness is a difficult task. Furthermore, theresearch topic of semantic research should not be confined within the relatedness between two words. If the problem of semanticrelatedness among linguistic units more complicated thansingle word is resolved with more preciseness, some NLP taskswill be improved.Different from the research of semantic relatednessbetween two words, this thesis promotes a new task concerningobtaining semantic relatedness sequence. In this task,computing the relatedness between a short text and a languageunit within the short text is a elementary issue. Furthermore,we do not place emphasis on the value between a short text and a word, and place more emphasis on the method of obtainingsequence of the units in short text according to the relatednessbetween each unit and the short text. The sequence is brieflycalled Relatedness Sequence. In this thesis, four computingmethods for obtaining relatedness sequence are promoted:rising weight sort、optimat path sort、maximal sort、totalrelationship declining weights sort.Evaluation of the computing methods for observingrelatedness sequence is another important topic researched inthis thesis. To resolve the problem of evaluation, we obtainedpsychological data of relatedness sequence by organize apsychological experiment based on100short texts. Theevaluating problem is formulated as the computing of similaritybetween to sorted sequence. The four evaluating methods, i.e.,the parameters is used for the issue of evaluation, includingPearson correlation coefficient、Sorted vector correlationcoefficient、kendall rank correlation coefficient、Relevancecorrelation coefficient.Generally, in this thesis, a novel research topic ofobtaining relatedness sequence is promoted. And this thesis ismainly concerned about the computing method of obtainingrelatedness sequence and the evaluating method for the computing methods. Our idea of computing similarity between twosorted sequences is a mathematic problem deserves moreattention. The work in this thesis is of some significance inthe research of semantic relatedness.
Keywords/Search Tags:Semantic, Semantic relevancy, Vocabulary Short text, Evaluation Method
PDF Full Text Request
Related items