Font Size: a A A

Study Of Chinese Text Similarity Based On Number Difference Gene

Posted on:2012-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ChenFull Text:PDF
GTID:2178330332995812Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text similarity calculation in the use of Chinese information handling belongs to the fundamental work, a high-quality text similarity calculation method must acquire accuracy and efficiency, that is to say, it should be compared from the aspect of context's natural language meaning, on the base of fully understanding for author or context source semantic, then get the similarity distinction of similar artificial reading. At the same time, it has an efficient calculation method to save time when face a large mount of in formations.The micro information's dissemination is the information technology development new characteristic, unifies the micro information the characteristic, to solve the long language materials the semantic deviation question which creates to the short language materials' writing spreadability question, this paper presents the Chinese context similarity calculation which based on the number difference. According to many related literatures of domestic and foreign researches, and after making a further analysis and research for the current condition of the similarity calculation, it puts forward a new method of improving the similarity function--- combining the way of traditional statistic and narrow semantic usage together, combing the statistic efficiency and semantic accuracy together, combining the advantage of statistic and semantic together. If necessary, it must encounter the disadvantage of overcoming the two methods. This article attempts to explore the inner context's similarity calculation which start with the number difference, and the number diversity of Chinese words, the word frequency and the semantic of combination for word and number, and it also bases on the words similarity calculation of network.Finally, it adopts the small self-built text as the test object, and compares the similarity calculation of different method in the laboratory environment, indicating that the similarity methods based on words difference, its performance is better than traditional methods based on statistical and semantic. It provides a new way of thinking for the Chinese context similarity calculation through comparing the accuracy and the cutting word speed's text of the topic's research result.
Keywords/Search Tags:context similarity, Chinese cutting word, number difference gene
PDF Full Text Request
Related items