Font Size: a A A

Research On Algorithm Of Chinese Text Similarity Based On Semantics

Posted on:2020-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:L Y LiuFull Text:PDF
GTID:2428330602967994Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the era of explosive information growth,the demand for personalized information acquisition is becoming stronger.How to obtain the valuable personal information needed from the huge information database is extremely important.The demand for information classification and retrieval technology that has arisen has increased.Application of technology in classification mining processing has become a key issue.In terms of Chinese text processing,calculating text similarity is a very popular research hotspot.It is often used in information text retrieval,artificial intelligence services,similarity checking,etc.How to improve the accuracy of text similarity detection methods is worthwhile In-depth research and practical issues.At present,there are several popular algorithms to solve this kind of problem: vector space model algorithm(VSM),this type of algorithm does not consider the existence of semantic similar words in text data,nor can it solve the problem that semantic similar words interfere with the accuracy of the algorithm,which makes the accuracy of text similarity calculation insufficient;The hidden semantic index(LSI)algorithm,which mainly uses Singular Value Decomposition(SVD)to decompose text files,can not solve the problem of semantic relevance;KNN algorithm is a non-parametric text classification method.Because of its simple and effective characteristics,it is widely used in machine learning,but its time cost and operation and maintenance efficiency are relatively poor.The maximum phrase combination algorithm can quickly and efficiently perform word segmentation.However,it is impossible to accurately classify phrases with similar semantics and large text differences,and it cannot solve the problem of semantic relevance.Therefore,this paper proposes a new solution,which adds the semantic similarity of words to the text similarity algorithm,analyzes the algorithm complexity,and proposes a new semantic-based text similarity algorithm to improve the accuracy of the algorithm.This paper analyzes the existing research results of text similarity,and adds the semantic similarity effect of words on the basis of existing results.How Net is an excellent reference for Chinese semantic similarity.This article uses the traditional VSM algorithm,phrase similarity algorithm and text similarity algorithm with semantic similarity to compare.The data set is selected from a college paper,and to verify the theory The accuracy was further tested and analyzed.Experiments prove that compared with the traditional VSM algorithm,the text similarity algorithm with added semantic similarity shows excellent performance in verifying text similarity,which can accurately judge the similarity of articles without misjudging dissimilar articles;Compared with the existing maximum word combination algorithm,the text similarity algorithm with added semantic similarity is superior to the maximum word combination algorithm in terms of calculation time,and in terms of calculation performance,it is more stable than the maximum word combination algorithm.Compared with the traditional algorithm and the maximum phrase combination algorithm,the new algorithm proposed in this paper is effective,non-differential and accurate.
Keywords/Search Tags:text processing, text similarity, HowNet, VSM
PDF Full Text Request
Related items