Font Size: a A A

Research On Calculation Of Semantic Similarity Of Short Text Based On Feature Fusion

Posted on:2022-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhaoFull Text:PDF
GTID:2518306554971039Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Determining the semantic score of short text is one of the most important tasks in many applications of natural language processing.The existing technical modeling methods are mainly based on mathematical operator modeling based on character matching degree and neural network modeling based on word vector,and then calculate the cosine distance of two short texts to obtain similarity.both approaches are based on single feature modeling.Therefore,the results of text similarity calculation depend on the amount of data with tags and single features.The method of single feature modeling is difficult to consider all the features of the text completely,thus reducing the quasi-model Accuracy.The redundancy and noise between multi-feature combination data are difficult to deal with.Redundant data and noise will have a negative impact on the accuracy of the calculation results.The result of calculating the similarity of short text deviates from reality.If we want to get the full semantics of a complex text sentence,we need to consider the structure information and association information between texts completely.Aiming at the problem that the single feature mentioned above is difficult to express the text feature comprehensively,the multi-feature combination data is constructed and the feature extraction operation is carried out.A syntax tree embedding algorithm is designed to obtain dependency vectors.Build a syntax tree self-encoder to get dependency information.Dependency information is combined with semantic information vector and position information vector to form multi-feature combination data.The multi-feature data input ensures that the model sentence input layer is enhanced.Aiming at the problem of model performance decline caused by data redundancy of multi-feature model,feature fusion and screening mechanism are designed.The combined features are split and filtered and then fused into global features Global features can further enhance similarity.The specific research work of the proposed paper mainly includes the following three aspects:First,the algorithm of embedding syntax tree is designed,and the block of dependent matrix is generated by dependency annotation symbol.In order to solve its internal data sparsity,the dependency matrix blocks are reduced and de-noised.The dependency matrix after dimensionality reduction preserves the dependency information of sentence words and provides structural features for subsequent step calculation.Dependency matrix as an auxiliary feature improves the accuracy of the system.Second,the feature extraction encoder is constructed,and a semantic feature,dependent feature and location feature are used to form an image data input.To solve the problem of data sparsity caused by multi-feature combination,feature extraction self-encoder is constructed.Third,a feature screening network is designed and constructed,and a fragmented local feature is obtained by screening and calculating the refined data features.Feature fusion is used to solve the local distribution of global features.At the end of the model,the global feature vector is calculated to obtain the similarity of the short text.To validate the feasibility of the above scheme,the experiments were adopted on the Sem Eval data set.The experimental results show that the characteristics of dependency tree are increased to 87.4% after embedding.The method achieves the best results for 6 of the Semeval 8 datasets.
Keywords/Search Tags:Feature Fusion, Semantic Computing, Pretreatment, Semi-supervised System, Similarity, Data Fusion
PDF Full Text Request
Related items