Font Size: a A A

Text Classification Method And Its Application In The Field Of Four Insurances And One Housing Fund

Posted on:2022-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WuFull Text:PDF
GTID:2518306353977229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet,more and more people are involved in the Internet,resulting in massive text data.In order to explore the value of text,more and more scholars begin to focus on the field of natural language processing.Although neural network technology has shown a strong performance in the field of text classification,however,there is little research on Chinese text classification at present,and most of the traditional Chinese text classification models directly follow the processing method of English text and directly use words as the input of the model.But Chinese text needs word segmentation,if only the result of word segmentation is used as input,the error may be spread because of the word segmentation error.Therefore,the thesis first introduces the research background and current situation of Chinese and English text classification,analyzes the main difficulties of Chinese text classification,and proposes a text classification model for Chinese,and experiments are carried out to prove the effectiveness of the model.Four insurances and one fund is a policy to protect people's livelihood,which is related to everyone in life.Although deep learning and natural language processing have developed rapidly in recent years,there are few researches on the application of natural language processing in the field of four insurances and one fund.The construction of the knowledge graph of four insurances and one fund can not only improve the work efficiency,but also help the people to obtain the knowledge they want to know,and at the same time,it can reduce the burden of the staff.In the process of building the domain knowledge base of four insurances and one fund,it is often necessary to obtain the entity interpretation information.At present,most of the methods to obtain the entity interpretation information are directly crawled from the encyclopedia website,so the entities not included in the encyclopedia website will not be interpreted.Therefore,the thesis also explores the method of obtaining entity interpretation directly from corpus.The main contributions of the thesis are as follows:(1)In the thesis,a neural network model based on joint training of words and characters is proposed for Chinese text classification.It takes characters and words as input of the model,uses convolution neural network and maximum pooling technology to extract key pattern information of Chinese text,uses bidirectional long short-term memory network to extract the structure information,and finally uses attention mechanism for information fusion,it makes characters and words complement each other,and further improves the performance of Chinese text classification model.Experiments on three datasets show that the proposed model can achieve higher classification accuracy than some existing models.(2)The thesis explores the use of text classification to obtain entity explanations from corpora.By dividing sentences into sentences with entity explanation and sentences without entity explanation,this thesis constructs a sentence classification data set in the field of four risks and one fund.According to the characteristics of the data set,a bidirectional LSTM model based on word joint training is proposed for training,and the trained model is applied to the field corpus of four risks and one fund to obtain entity explanation.Finally,aiming at the problems existing in the method,such as that the method can not deal with the situation that an explanation contains multiple sentences,an explanation expansion algorithm based on semantic similarity is proposed to further expand the entity explanation information in the four insurance and one fund domain knowledge base.
Keywords/Search Tags:Chinese Text Classification, Four Insurances and One Fund, Key Pattern Information, Word Structure Information, Entity Interpretation
PDF Full Text Request
Related items