Font Size: a A A

Research On Text Semantic Feature Detection And Proofreading

Posted on:2020-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HaiFull Text:PDF
GTID:2428330575971444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the new generation of information technology and the explosive growth of data,the requirement for data quality has reached an unprecedented level.Accuracy is one of the basic requirements of high-quality data.Both media news and government information have extremely high standardization requirements for semantic expression.However,it is difficult to perform the normative inspection and processing of large data volume by manpower,and guarantee the inspection effect.Therefore,how to extract the semantic features of text efficiently and accurately,and study the method of text proofreading on this basis have important theoretical significance and practical value.The paper finds that the constituent units in texts are words,and each word has a corresponding meaning,and the correctness of collocation relationship between the words depends on the collocation relationship between the semantics of words by analyzing a large amount of text data.For these reasons,the paper constructs a semantic collocation relationship representation learning model by studying the potential relevance between word meanings,and proposes a text semantic feature detection and proofreading method.The main contents are as follows:1)Combining the HowNet knowledge base with corpus to analyze the relationship between words and semantics.And neural network is used to learn the potential mapping relationship between words and sememe,which converts the structural expressions of words in sentences into abstraction.The high level expression of sememe enhances the semantic expression of the sentence and provides information for the semantic collocation relationship prediction module to facilitate abstract analysis.2)A two-layer LSTM(Long Short-Term Memory)network model with shared hidden information is proposed,which as a sub-model of the integrated algorithm,a semantic-level collocation relationship prediction model is constructed.The LSTM network model can reduce redundant information and improve training efficiency onthe basis of ensuring model prediction capability.Since the contextual correspondence of language is not one-to-one correspondence,in order to improve the overall predictive ability of the model,an integrated algorithm is used to integrate multiple sub-models.The differences between sub-models are used to expand the prediction range of the integrated model for context information,and the prediction of semantics' collocation is completed.3)A PDI evaluation method combining mutual information and degree of polymerization is proposed.In order to ensure the accuracy of proofreading and reduce the influence of noise data on proofreading results,the paper combines PDI with the fuzzy matching method to vote on the generated proofreading suggestions.According to the voting result,the candidate proofreading suggestions are sorted in descending order,and the semantics error of the sequence is judged by the matching degree of adjacent sememe and candidate proofreading suggestion set,then the top ranked proofreading suggestions are retained as output information.
Keywords/Search Tags:Feature Extraction, Feature Detection, Text Proofreading, Fuzzy Matching, Integrated Algorithm
PDF Full Text Request
Related items