Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics

Posted on:2019-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:G L Feng

Full Text:PDF

GTID:2438330545479148

Subject:Industrial engineering

Abstract/Summary:

As a carrier of information,text is the most common way of presenting information in people’s life and work.The computation of text similarity is widely used in many fields,such as information retrieval,text classification,knowledge mining,information filtering and so on,it is self-evident in the process of information processing,and the computation of text similarity is also a fundamental and key problem in the field of information processing.Because of the complexity and particularity of Chinese language,compared to other languages text similarity calculation,the calculation of Chinese text similarity is more difficult,so the calculation of Chinese text similarity has long been a hot and difficult topic in the field of information processing.At present,many scholars have done some research on the similarity calculation of Chinese text,and made some progress,and put forward the calculation method of Chinese text similarity.In this paper,the Chinese word segmentation technology and the existing Chinese text similarity algorithm are studied and discussed in depth.Among them,the more commonly used vector space model methods and the semantic similarity algorithm based on How-net are studied mainly.And this paper summarizes the two kinds of algorithms,analyzes their advantages and disadvantages,puts forward the improvement methods,and puts forward a new method for calculating the similarity of Chinese text.In view of the vector space model method ignores words semantic and structural relations between words,and does not consider the practical meaning of word expression,firstly this method extends semantic similarity calculation of the word to semantic similarity calculation of the paragraph level,then put the method of the semantic similarity to the vector space model and set reasonable weighted parameters,and finally obtains the result of text similarity by semantic similarity and vector space model similarity weighting.According to the improvement thought of Chinese text similarity computation,a Chinese text similarity comparison system is designed in this paper.The basic process and architecture model of the Chinese text similarity comparison system are elaborated in detail,and the implementation of the system is completed.Finally,the improved Chinese text similarity algorithm proposed in this paper is compared with the existing Chinese text similarity algorithm,and the results of the experiment are analyzed and summarized.Experimental results show that the recall rate obtained by the similarity algorithm proposed in this paper has been improved to some extent compared with the vector space model method and the existing semantic similarity algorithm,which proves the availability and effectiveness of the proposed algorithm.

Keywords/Search Tags:

Chinese text, similarity, word segmentation, vector space model, How-net

Related items

1	Study On Chinese Text Similarity Computing Based On Word Segmentation
2	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
3	Study On Text Category Oriented Chinese Text Mining And Its Implementation
4	Research And Application Of Key Technology In Intelligent Search Engine
5	Chinese Text Data Classification
6	Research And Implementation Of Chinese Text Clustering Algorithms
7	Research On Chinese Text Categorization Algorithms Based On Technology Text
8	Research And Implementation Of Subjective Question Scoring System Based On Chinese Word Segmentation And Text Similarity
9	Improved Vector Space Model And Its Application To Document Classification System
10	Based Segmentation Of Chinese Text Automatic Classification And Implementation