Font Size: a A A

Research And Implementation Of Personal Text Classification Based On Skewed Data

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:L Q GaoFull Text:PDF
GTID:2428330569498872Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of intelligence,computer data managers have changed from manual to intelligent,personal information management has become a hot research area.How to effectively manage the increasing number of personal documents,improve work efficiency,is an important research topic in the field of data analysis and mining.In recent years,with the continuous improvement of text classification technology,there is more room for improvement in the direction of text data management.With the rapid growth of text data,it is an effective measure to improve the efficiency of personal information management.However,one of the difficulties in the classification of personal text documents is that the focus of each user's text documents is different.Skewed data can result in the deviation of the results in the process of text classification(larger number of samples),while ignoring the number of samples is small,resulting in a deviation or even wrong results.For the above problems,this paper mainly completed the following work:In this paper,a fast and effective feature selection method T-DA is proposed to solve the problem of personal document with Skewed data.with the characteristics of high dimension space and skewed data,the TextRank algorithm is used to reduce the dimension of the text quickly;On this basis,by using the method of selecting the strongest component features,the feature words of each class of the most component forces are selected to form the "class feature vector";Finally,based on the class feature vector proposed according to the characteristics of T-DA algorithm,the classification method of word matching,this method has many possible results can push the role,through the interaction with the user to achieve accurate classification.The classification data of fudan news corpus and the Liberation Army Daily assessment based on the performance and feasibility of the algorithm,verified the above classification method has good practicability of personal text data.Based on the algorithm,this paper also designs and implements a text document classification prototype system based on the above algorithm,given a text document judgment module,preprocessing module,T-DA feature selection module,class feature vector map construction module,classification module,interface design and Implementation.This paper also studies the engineering implementation of the prototype system of personal text document classification,and provides a solution for the automatic classification of individual text documents.
Keywords/Search Tags:Personal Text Data Management, Data Skew, Text Classification, Feature Selection
PDF Full Text Request
Related items