Font Size: a A A

Knowledge-Driven Text Classification Method For Specific Domain

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2428330614970111Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the informatization degree of all walks of life is getting higher and higher,and a large number of texts of specific fields have emerged.Due to its particularity and specialty,texts in specific fields are different from those in general fields in semantic understanding,which are difficult to be understood by literal meaning and need to be understood by combining with the knowledge of professional fields.In addition,although there is also a large amount of text data in a specific field,due to its particularity and specialty,manual annotation costs more than that in a general field,and there are few corpora that can be used for training,so it is difficult to support large-scale training,which makes it more difficult to learn semantics.(1)Aiming at the problems of few marked samples,many unmarked samples and difficult to understand semantics in special fields,this paper proposes a semi-supervised learning text classification algorithm based on Internet knowledge.First of all,the text expansion of the samples is carried out by using Internet knowledge to overcome the problem that the samples in specific fields are difficult to be understood literally.Then,a two-view semi-supervised classification algorithm based on shallow learning and deep learning(called co-dsl algorithm)is proposed,which trains the classification model in a semi-supervised way by using a small number of marked samples and a large number of unmarked samples.Finally,the method is applied to the App classification field,and the experimental results show that the method is obviously superior to other traditional algorithms in the classification effect.(2)In the text classification task for a specific domain,in addition to the scarcity of samples,the text for a specific domain is generally very professional and contains alot of professional domain knowledge that is difficult to learn from the samples.To solve this problem,this paper proposes a deep learning text classification method based on knowledge graph.Firstly,semi-automatic knowledge mapping is constructed through the Internet.Then,through entity recognition technology,the characteristics of the entities linked to the knowledge map in the problem are introduced into the original text to achieve the purpose of feature expansion,and the bi-lstm model is used for text classification.Finally,the method is applied to the field of environmental text classification.The experiment shows that the accuracy of text classification can be effectively improved by introducing knowledge map.(3)The design and implementation of a multi-intention hybrid intelligent question answering system.The system first identifies the user's intention based on multi-domain text classification,and then calls the corresponding intelligent question answering module to answer the user's question based on the user's intention.
Keywords/Search Tags:knowledge-driven, deep learning, text classification, semi-learning, knowledge graph
PDF Full Text Request
Related items