Font Size: a A A

Research And Implementations Of Text Classification Based On Semantic Enhancing

Posted on:2022-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Z TangFull Text:PDF
GTID:2518306572487474Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The rapid development of natural language processing(NLP)methods fastens the research progress of text classification,which includes manual classifiers,supervised classifiers,semi-supervised classifiers,and unsupervised classifiers.We focus on three issues in text classifications.Firstly,there exists lots of polysemous words in input texts,which makes it hard to capture accurate sematic with traditional contextual methods.Secondly,as a special information carrier,there are several aspects of Chinese imply key semantic,including pinyin for pronunciation,wubi for structure,radicals for components.For the polysemous issue in text,we present a novel way injecting factual knowledge into BERT model.we employ open-source knowledge base to query the adjacent neighbors of entities and use them as potential meanings and select the one of them which gets the highest consine score with the average vector of the text.Finally,we conduct our model on QQP,SQu AD,etc.The experimental results that our model outperforms the previous models on SQu AD and NER.For the multiple expressions of text semantics in Chinese,we proposed a semantic fusion framework based on multiple granularities.Firstly,we use opensource analysis tools to generate the radical,pinyin and wubi sequences.Secondly,we proposed a novel classifier combining Chinese character,pinyin,wubi and radical expressions.We conduct our model on four widely used Chinese datasets,comparing with other SOTA methods in detail,including LTSM,BERT and its successors.The experimental results indicate that the fusion of multi-granularity model architecture outperforms other normal classifiers in Chinese text classification.For the importance of entities to the semantic of a text,we proposed an entityaware framework.To be intuitive,entities play an important role in the sematic of a sentence and the relationships among them can be organized as non-Euclidean graph structure,so we proposed an entity aware GCN to encode the entity information into the prediction model to improve the effectiveness of the text classifier.Finally,the experimental results show that the entity-aware proposed here performs the ordinary text classification methods.
Keywords/Search Tags:Semantic Correction, Text Classification, Entity-Aware Encoding
PDF Full Text Request
Related items