Font Size: a A A

A Study For Classifying Short Text In Social Network

Posted on:2020-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2428330596476086Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rich information in social networks makes it a research hotspot of data mining technology.Through data mining technology,disaster events,public opinion,and suspicious user accounts can be monitored,which can improve the disaster prevention and rescue capabilities of relevant organizations and achieve more intelligent political decisions.However,the low signal-to-noise ratio of social network data,and the existing data mining technology less consider the impact of noise,value information is covered by spam,affecting the effect of data mining.By classifying social network short text data,filtering garbage data,and retaining value data,it can provide cleaner input data for data mining,thereby improving the effect of data mining.Social network data has the characteristics of short length,less content,single feature selection and weak feature expression.Therefore,traditional text classification methods are not effective in social text classification applications.Moreover,the existing text classification methods are mostly implemented by supervised learning methods.However,there are supervised methods for constructing a good classification model,which has a high dependence on the size and quality of the annotated data sets.However,there are often insufficient annotation data in actual work.Problems such as difficulty in labeling and high cost of labeling make it difficult for existing methods to obtain accurate classification results of social network text data.This thesis conducts research on the above two issues,and the main contributions are summarized as follows:1.A short text classification method for social networks based on multi-attribute features is proposed.In the feature extraction stage,based on the traditional text semantic features,by analyzing the characteristics of the data,the social attributes and structural attributes are extracted as important supplementary features of the semantic attributes,which solves the problem of weak feature expression in the traditional methods.Take advantage of the information provided by social networks.In the feature learning stage,different regression models are used to learn the multi-attribute features,which improves the learning ability of each model.In the multi-model fusion,the soft processing operation of the weighted average regression fusion classification is adopted to reduce the introduction of noise.The robustness of the model is enhanced to achieve efficient classification of data.In the test of real data,compared with the commonly used methods,this method has strong feature expression ability,the regression fusion strategy is effective,and the classification performance is significantly improved to meet the application needs.2.A short text classification method based on active learning for social networks is proposed.On the basis of multi-attribute feature classification method,the active learning framework is added,and the process of selecting data to be marked by experts through the query function batch processing greatly improves the training efficiency of the algorithm,introduces the knowledge of external experts,and reduces the introduction of noise.The propagation of errors,using the number of iterations as a termination condition,simplifies the setting of parameters,ultimately reduces the need for training data,reduces the cost of classification,and achieves efficient training of algorithms.In the test of real data,compared with the classification method of multi-attribute features,the demand of training data is reduced by 20 times under the condition of ensuring classification performance,thus solving the problem of insufficient labeling data.
Keywords/Search Tags:social network, short text classification, multi-attribute feature, regression fusion, active learning
PDF Full Text Request
Related items