Font Size: a A A

Chinese Paragraph-Level Text Semantic Similarity Algorithm

Posted on:2023-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y P XiangFull Text:PDF
GTID:2558307070984209Subject:Engineering
Abstract/Summary:
Semantic full-text comparisons of NSFC proposal abstracts can help managers grasp which research work has similarities,while it would be very inefficient to pair up millions of NSFC proposals.The theis proposes a sentence-level semantic feature extraction model with fused syntactic structure and a paragraph-level semantic similarity algorithm,which enables efficient and accurate matching of similarity research works in NSFC proposal applications and provides a powerful technical support for fine-grained grouping of projects in the fund evaluation process.The main contributions of the theis are as follows.(1)Semantic model with syntax model.To address the problem that feature extraction using a single network is not comprehensive enough for Chinese text,the theis combines sequence features and syntactic features to build semantic model with syntax model,and achieves semantic enhancement for Chinese sentence text by interactively fusing pre-trained model features with convolutional neural network features.The experiments show an improvement of 5.19% and 5.22% in accuracy and F1 over the baseline on the NSCFP dataset.(2)Paragraph-level semantic similarity model.In order to solve the problem of semantic features hidden in complex paragraph contexts,the theis constructs a paragraph-level semantic similarity model based on sentence feature.The Dense Bi GRU network is used to fuse sentence semantic features and perform semantic similarity detection based on siamese networks,which effectively learns the features representing paragraph text and is used for similarity comparison of applications.Experiments on CNSS and CNSE datasets show that the algorithm improves the accuracy and F1 metrics by 1.91% and 1.74% on average relative to the baseline.(3)Semantic similarity detection system for NSFC proposal.In order to detect the similarity of Chinese paragraph texts,the theis develops a software for Chinese abstract texts of NSFC proposal,using SMS and PSSM as the underlying technology,and provides functions such as fine-grained grouping,which provides support for project classification.
Keywords/Search Tags:semantic feature, syntactic structure, paragraph, abstract, semantic similarity
Related items