Font Size: a A A

Research And Implementation On Text Classification In Vertical Domain

Posted on:2019-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330572963630Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Vertical domain text classification refers to the research content is limited in a specific professional field or demand.Compared with text classification in general sense,text classification in vertical domain focuses more on the exclusive domain in text representation,feature extraction and classification methods.Therefore,vertical search engines,information classification and other applications derived from vertical domain text classification can provide professional,refined and in-depth information and services for specific users or needs.Vertical domain text has typical domain-specific attributes,which makes text representation closely related to application scenarios.On the one hand,the unstructured attributes of text bring some difficulties to classification task;on the other hand,in specific scenarios,the similarity of text which represents domain features is higher,and the application services require more accurate classification methods,which puts forward higher requirements for text classification in general domain.Thus,how to provide a more fine-grained classification method based on the general classification method is the focus of this paper.Multi-label classification is a common method of text classification.However,in the text classification task of vertical domain,because of the high similarity of text features,the semantic relationship between labels is similar and the logical relationship is deep.Accordingly,it is a challenge to find a text classification method with high discrimination in specific domain.This paper takes the military domain text obtained from the Internet as the research object.And makes a through study on improving the performance of text classification in specific fields,which based on the Doc2 VecC model.Some research work has been carried out as follow:1.A text feature filtering strategy based on inverse document frequency is proposed to achieve high quality text feature filtering.The inverse document frequency of words has the ability to reflect the theme of the document.Retaining the higher frequency of the inverse document can increase the proportion of feature information in the identification and improve the text feature expression.2.A PV-IDF model with enhanced text features is constructed to achieve fine grain classification in vertical domain.Using enhanced features to represent text,replacing traditional text markers with feature markers,and composing input vectors with context information can significantly enhance text features to achieve high-quality fine-grained classification.3.A multi-label classification framework is constructed by combining the representation model of enhanced text features,and a label selection method based on static threshold and a dynamic threshold label selection method based on least square method are proposed.At the same time,based on LSTM classification model,the similarity relation is quantified by constructing label similarity matrix,and the label prior knowledge is fused with the model to increased the rationality of multi-label classification in professional fields.
Keywords/Search Tags:Text Classification, Text Representation, Inverse Document Frequency, Multi-label Classification Learning, Label Correlation
PDF Full Text Request
Related items