Research On Information Parsing Based On Text Classification

Posted on:2020-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:L Fu

Full Text:PDF

GTID:2428330575963024

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Information parsing is a very important and challenging task in natural language processing,and it also plays an important role in natural language processing applications,such as public opinion monitoring,web search and Intelligent Question-Answer,etc.In recent years,with the continuous development of deep learning,the research of Information parsing has achieved rich results and has been widely applied in engineering of natural language processing.But there are still some shortcomings in some aspects,for example,the supervised deep learning methods require a large amount of high-quality manual labeled training data,which is time-consuming and labor-intensive.And in the Chinese text,the text data will appear ambiguous in the word segmentation,besides,the meaning of the single Chinese character expression is inaccurate and not rich,furthermore,in different situations,the importance of the Chinese words is not the same as the Chinese characters,which leads to some problems in the application of Information parsing in engineering.In order to solve the above problems,this dissertation first proposes a new active learning method,and combines it with the deep learning method.Then this dissertation also proposes the hybrid of the character-level and word-level features with different weights through concatenation,so that the final result of the model can take into account both word-level feature and character-level feature.This dissertation studies Information parsing based on text classification.The main work is as follows:(1)A new active learning method is proposed and combined with deep learning methods to achieve Information parsing.The supervised deep learning models typically require a large amount of high quality and labeled training sample data during the training process.Obtaining such sample data artificially is cumbersome and unreliable,and the process is also very time consuming and labor intensive.Active learning helps assuage this problem by automatically selecting a small amount of unlabeled samples for humans to correct by hand.It is to continuously select the sample data that needs to be labeled,and then iteratively train the deep neural network using these sample data until the expected experimental results are achieved.This dissertation proposes an active learning method for three sample probabilistic selection strategies based on deterministic criteria,which effectively solves the problem that a supervised deep learning method requires a large amount of manual labeling data.The experimental results show that compared with the case of pure deep neural network,the amount of marker training data required to combine active learning with deep neural networks can be reduced by 45.79%in this dissertation,while achieving a given extraction accuracy.(2)Based on the convolutional neural network and the bidirectional long-term memory network attention mechanism model,this dissertation proposes the hybrid of character-level and word-level features with different weights through concatenation to improve the performance of information source analysis.For Chinese words,it is different from Western languages,because there is no separator between words in Chinese text.Therefore,it is first necessary to perform Chinese word segmentation.However,each sentence may have different semantic relevance in Chinese text,which leads to several different word segmentation results after Chinese word segmentation operation,that is,Chinese word segmentation will appear ambiguity issue.For Chinese characters,there is a separator between characters,so there is no ambiguity in the Chinese character segmentation.However,the meaning of a single Chinese character is not accurate and rich.Moreover,for different situations,the importance of the Chinese words and the Chinese characters is not the same.Therefore,this dissertation proposes the hybrid of the character-level and word-level features with different weights through concatenation,so that the model can consider two levels of features at the same time,and let them make up the respective shortcomings to improve the performance of information source analysis.The experimental results show that compared with the simple of word-level features and character-level features,the proposed method improves by 1.20%and 1.69%on the THU dataset,and improves by 2.28%and 5.13%on the Enterprise announcement dataset respectively.

Keywords/Search Tags:

Information extraction, Text classification, Active learning, Deep learning, Natural language processing

PDF Full Text Request

Related items

1	Intelligent Device Text Classification Method Based On Natural Language Processing
2	Research On Deep Learning Methods For Text Classification Tasks
3	Research On Chinese Text Classification Algorithm Based On Active Learning Approach
4	Research On Machine Learning For Natural Language Processing And Transmission
5	Research And Analysis Of Text Classification Theory Based On Deep Learning
6	Research On Financial Text Classification Method Based On Deep Learning
7	Research On Network Text Sentiment Classification Based On Deep Learning
8	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
9	Classification Of Sexual Harassment Dialogue Texts Based On BERT-CNN
10	Research On Text Classification Based On Deep Neural Network