Font Size: a A A

Short Text Classification Based On Integration Of Ontology And BTM Feature Extension

Posted on:2019-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:C M PengFull Text:PDF
GTID:2428330566976005Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of the Internet and social communications has brought technology innovation to many companies.Enterprises can quickly collect complaint feedback information of various product evaluations and service quality through the complaint feedback channel based on the Internet social platform.The results of analysis and use of information from feedback can help companies quickly understand customer needs,improve product quality,increase sales,and provide strong technical support for business decisions.Such complaint feedback information texts are usually short in length,have features such as strong real-time,massiveness,sparseness,non-standard expression patterns,and unbalanced distribution of samples,which not only inherit the characteristics of short texts,but also have more fields.The unique vocabulary,which makes the traditional feature extension method to improve short text classification sparseness,can not show good results in the category of complaint short texts.This paper mainly studies how to improve the sparseness of short text classification in the field of short text classification,and takes the complaint-related short texts of Windows 10 operating system software as an example to propose a method to address the sparseness of short texts in this category.The short text classification method integrating ontology and BTM feature extension provides a way to improve the classification of short texts in this field.Experiments show that this method can effectively improve the short text classification effect.The main work of this paper is as follows:(1)Construct domain vocabulary ontology.Ontology's task is to collect knowledge in related fields,provide consistent vocabulary in the field,and give a clear definition of the relationship between these vocabularies in different levels of formal structure,which is consistent with the goal of improving short text feature sparse methods.Therefore,in order to alleviate the short text feature sparseness problem of Windows 10 system complaints,Windows10 system domain vocabulary is used as the knowledge base,and the OWL coding language is used in Protégé software to construct Windows 10 system domain ontology suitable for short text classification.Semantic relations between words,and an ontology lexicon as a short-text classification of domain feature vocabulary expansion sets.(2)Propose a short text classification algorithm that combines ontology and BTM feature extension.This method uses the BTM topic model to train short-term corpus and predict it to obtain thematic feature words,and then constructs an extended set of feature vocabulary and topic feature vocabulary in the post-fusion domain ontology,and uses the extended method of matching rules to use the feature vocabulary as a short text.Some of the features of this book have been extended to the original short text.Finally,the SVM classification algorithm is used to classify the extended short texts,and the expanded short text classification results are used as the original short text classification results.Experiments show that the method proposed in this paper can effectively alleviate the feature sparseness of the field short text in the classification process,thereby improving the classification effect.(3)Design and implementation of a short text information processing platform.The platform is based on the idea of software engineering combined with the research theory of this paper.The platform realizes the basic functions of short text information processing through three modules: preprocessing module,feature extension module and classification module.After testing,the platform can meet the classification requirements of short texts in the field.
Keywords/Search Tags:Short text classification, Domain ontology, BTM topic model, Feature extension, Short text information processing platform
PDF Full Text Request
Related items