Font Size: a A A

Research On Chinese Text Classification Based On Semantic Analysis

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiFull Text:PDF
GTID:2348330521450020Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the development of computer technology such as large data processing and cloud computing,the information and artificial intelligence technology is constantly changing overall people's life style and way of cognition.At the same time,providing efficient,intelligent and semantic-related natural language processing technology is put on the schedule and it is important and can contribute to the rapid development of related fields.the technology of Chinese text classification based on semantics analysis is the key and basic work of Chinese natural language processing.The traditional text mining and classification techniques only consider the repetition of the text vocabulary and could not understand the word context accurately,resulting in a series of problems such as the lack of syntactic and semantic relations,wrong analysis results,inefficiencies and other issues.In recent years,with the rise of deep learning technology,Chinese information processing technology based on neural network has been widespread attention and study.In order to solve the problem of missing semantic and syntactic information caused by traditional Chinese text processing methods.In this paper,we introduce the semantic technology based on neural network and its related algorithms into text classification,aiming to improve and perfect the semantic similarity calculation method of Chinese text,and to solve the problem of missing text structure and semantic information in traditional text representation model.The main research content of this paper is to explore and improve the semantic similarity calculation method based on word vector,aiming to improve the accuracy of text semantic similarity calculation.Another important research content of the paper is to explore and improve the text document vector model with some research on the neural network language model to improve the accuracy of the expression of text semantic features,and promote the promotion of text classification and text mining.This paper studies the semantic similarity of Chinese text based on word vector and proposes a Chinese text classification method based on improved document vector.The paper elaborates the semantic similarity calculation theory and describes the neural network language model and other models comprehensively from the perspective of language model.Then the paper derives CBOW and Skip-gram word vector model in detail and considers the differences in the weight of different words to presents a text semantic similarity calculation method based on word vector.Another important research content of the paper introduces the traditional text representation and its classification process in detail and then presents an efficient document representation learning framework which ensures a representation generated as such captures the semantic meanings of the document during learning.The new model introduces a data-dependent regularization that favors informative while forcing the embeddings of common and non-discriminative ones to be close to zero.And then use Huffman coding and hierarchical softmax to solve the specific parameters of the model and get the document vector.The paper then uses the relevant classification algorithm and the document vector to complete the task of Chinese text classification task.The results of the experiment significantly improves testing efficiency.
Keywords/Search Tags:Text classification, Semantic Similarity, Deep Learning, Word Embedding, Document vector
PDF Full Text Request
Related items