Study On Chinese Text Similarity Computing Based On Word Segmentation

Posted on:2007-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:B Shen

Full Text:PDF

GTID:2178360182485964

Subject:Business management

Abstract/Summary:

PDF Full Text Request

In Chinese information processing, text similarity computing is widely used in the area ofinformation retrieval, machine translation, automatic question-answering, text mining and etc.It's a question of much essential and important that people study as a hotspot and difficulty for along time. It's more difficult for computers to process Chinese than to Western letters in theprocessing of word segmentation. Word segmentation is the foundation and precondition ofChinese text similarity computing, the accuracy of the result can be greatly improved whenadopting more efficient arithmetic.In this paper, a kind of improved Maximum Matching Method and the strategy to eliminatethe ambiguity is put forward on the basis of analysis and contrast of common Chinese wordsegmentation arithmetic. A new method which can improve the integrality and accuracy of wordsegmentation is put forward to improve the construction of the word segmentation dictionary, thesteps of word segmentation and the process of the ambiguity. Then on the basis of analysis andcontrast of existing text similarity computing methods, the realization of Chinese text wordsegmentation and similarity computing with computer system is put forward which make use ofTF-IDF method based on VSM combined with the word segmentation arithmetic whichmentioned above. Technological texts are tested as example to validate the method that used.The research and its outcome will have valuable reference and good applicable prospect tomany domains in Chinese information processing especially in technological text similaritycomputing.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics
2	Research And Implement Of An Optimal Approximate Matching System Of Structureless Text
3	Research On Key Techniques Of Cross-Language Text Similarity Detection Based On Word Vector
4	Research And Application On Chinese Automatic Word Segmentation In Full Text Retrieval
5	Research And Implementation Of Text Mining Technology Based On Public Security Information
6	A Study On Similarity Of Student's Homework Text Under Certain Condition
7	Key Techniques Of Text Ming On Criminal Cases
8	Research On The Technology Of Automatic Segmentation For Text-To-Speech System
9	Study On Text Category Oriented Chinese Text Mining And Its Implementation
10	Research On Chinese Text Categorization Algorithms Based On Technology Text