Font Size: a A A

Gender Classification Based On Micro-blog Text And Social Information

Posted on:2018-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:B DaiFull Text:PDF
GTID:2348330542465189Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the arrival of big data era,the explosive growth of data volume and the maturity of big data analysis technology make user portrait become an important research topic in natural language processing and data mining.How to use user's generated data to predict the user's gender has become a basic research task because gender as one of the basic attributes of user portraits.Although the study of gender classification based on Chinese micro-blog has been studied in some previous work,few studies leverage both micro-blog text and social information.Moreover,most previous gender classification methods are based on supervised learning methods which require a large number of labeled data.Taking into account the existence of these problems,the main contents of this paper include the following three aspects:First of all,this paper presents a multi-type text gender classification method based on LSTM.The core idea is to distinguish different types of text in micro-blog,through the use of ensemble Long short-term memory network(LSTM)joint learning different types of text information,to achieve the user's gender prediction.The method of training different types of text separately,effectively avoiding the mixed impact of all types of text on gender predictions.The experimental results show that the proposed method of ensemble Long short-term memory is significantly better than using only a single type of text and other joint learning multi-type text methods.Second,this paper presents a semi-supervised gender classification method based on multi-type text.The core idea is to distinguish different types of micro-blog text while reducing the dependence of the classifier on a large number of labeled data.Through Cotraining method,different types of text are divided into different views,using Long shortterm memory network(LSTM)model as the basic classifier.Pick out the unlabeled samples with high confidence from each view,and finally add the selected unlabeled samples to the labeled samples to expand the training sample size.The experimental results show that our method can obtain better classification results with only a small number of labeled samples,and our Co-training method is also superior to some traditional semi-supervised learning methods.Finally,this paper presents a semi-supervised gender classification method with joint textual and social modeling.The core idea is to combine micro-blog text feature learning and social relations to model the semi-supervised gender classification.Specifically,we define a social feature for microblog users who have the same concern,and build a textual and social factor graph(TSFG)model to achieve a joint learning of textual features and social features.The experimental results show that the method can effectively use the information of social relations to help learn the classifier,and obtain a better classification performance.
Keywords/Search Tags:Gender Classification, Multi-Type Text Classification, LSTM, Co-training, Factor Graph Model
PDF Full Text Request
Related items