Study Of Chinese Text Similarity Based On Number Difference Gene

Posted on:2012-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Chen

Full Text:PDF

GTID:2178330332995812

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text similarity calculation in the use of Chinese information handling belongs to the fundamental work, a high-quality text similarity calculation method must acquire accuracy and efficiency, that is to say, it should be compared from the aspect of context's natural language meaning, on the base of fully understanding for author or context source semantic, then get the similarity distinction of similar artificial reading. At the same time, it has an efficient calculation method to save time when face a large mount of in formations.The micro information's dissemination is the information technology development new characteristic, unifies the micro information the characteristic, to solve the long language materials the semantic deviation question which creates to the short language materials' writing spreadability question, this paper presents the Chinese context similarity calculation which based on the number difference. According to many related literatures of domestic and foreign researches, and after making a further analysis and research for the current condition of the similarity calculation, it puts forward a new method of improving the similarity function--- combining the way of traditional statistic and narrow semantic usage together, combing the statistic efficiency and semantic accuracy together, combining the advantage of statistic and semantic together. If necessary, it must encounter the disadvantage of overcoming the two methods. This article attempts to explore the inner context's similarity calculation which start with the number difference, and the number diversity of Chinese words, the word frequency and the semantic of combination for word and number, and it also bases on the words similarity calculation of network.Finally, it adopts the small self-built text as the test object, and compares the similarity calculation of different method in the laboratory environment, indicating that the similarity methods based on words difference, its performance is better than traditional methods based on statistical and semantic. It provides a new way of thinking for the Chinese context similarity calculation through comparing the accuracy and the cutting word speed's text of the topic's research result.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Study Of Chinese Text Similarity Research Based On Markov Word Order Gene
2	Research And Application Of Word Similarity Based On Context
3	Research For Chinese New Word Identification Based On Context-aware
4	Research And Implementation Of Subjective Question Scoring System Based On Chinese Word Segmentation And Text Similarity
5	Chinese Words Segmentation Based On Context And Stopwords
6	Indication Similarity Of Drugs Based On Chinese Word Segmentation Technology
7	Research On Chinese Text Similarity Detection Technology Based On Word Weight Analysis
8	Research And Implementation On Corresponding Method Between Chinese Name Of Transportation Data And Standard Terminology
9	Research On A Chinese Word Sense Disambiguation
10	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval