Font Size: a A A

Research On Text Similarity Algorithm Based On VSM Combined With Word Semantics

Posted on:2019-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:G L FengFull Text:PDF
GTID:2438330545479148Subject:Industrial engineering
Abstract/Summary:PDF Full Text Request
As a carrier of information,text is the most common way of presenting information in people's life and work.The computation of text similarity is widely used in many fields,such as information retrieval,text classification,knowledge mining,information filtering and so on,it is self-evident in the process of information processing,and the computation of text similarity is also a fundamental and key problem in the field of information processing.Because of the complexity and particularity of Chinese language,compared to other languages text similarity calculation,the calculation of Chinese text similarity is more difficult,so the calculation of Chinese text similarity has long been a hot and difficult topic in the field of information processing.At present,many scholars have done some research on the similarity calculation of Chinese text,and made some progress,and put forward the calculation method of Chinese text similarity.In this paper,the Chinese word segmentation technology and the existing Chinese text similarity algorithm are studied and discussed in depth.Among them,the more commonly used vector space model methods and the semantic similarity algorithm based on How-net are studied mainly.And this paper summarizes the two kinds of algorithms,analyzes their advantages and disadvantages,puts forward the improvement methods,and puts forward a new method for calculating the similarity of Chinese text.In view of the vector space model method ignores words semantic and structural relations between words,and does not consider the practical meaning of word expression,firstly this method extends semantic similarity calculation of the word to semantic similarity calculation of the paragraph level,then put the method of the semantic similarity to the vector space model and set reasonable weighted parameters,and finally obtains the result of text similarity by semantic similarity and vector space model similarity weighting.According to the improvement thought of Chinese text similarity computation,a Chinese text similarity comparison system is designed in this paper.The basic process and architecture model of the Chinese text similarity comparison system are elaborated in detail,and the implementation of the system is completed.Finally,the improved Chinese text similarity algorithm proposed in this paper is compared with the existing Chinese text similarity algorithm,and the results of the experiment are analyzed and summarized.Experimental results show that the recall rate obtained by the similarity algorithm proposed in this paper has been improved to some extent compared with the vector space model method and the existing semantic similarity algorithm,which proves the availability and effectiveness of the proposed algorithm.
Keywords/Search Tags:Chinese text, similarity, word segmentation, vector space model, How-net
PDF Full Text Request
Related items