Font Size: a A A

Comsumption Intent Recognition In Micro-blog

Posted on:2014-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiaoFull Text:PDF
GTID:2268330422450609Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet software and hardware, there havebrought great changes to people’s life. The Internet has already access thousands ofhouseholds. Everyone will have a personal computer or PDA and everyone willcarry a mobile phone to have access to the Internet by the telecom company. At therapid development of Internet stage,one of the main point is social. Various socialnetwork sites developed rapidly and occupied a large number of users. People usesocial network to share his life and express their views, in which also contains a lotof person’s consumption intent information. Therefore, micro-blog as a kind ofsocial media and social network application contains enormous commercial value.In this paper, we conducted a series of consumption intent studies onmicro-blog. We attempt two methods, template extracton and classification toresearch consumption intent and we also form engineering implementation method.This paper identifies micro-blog consumption intent through the following methods.Sina micro-blog data acquisition and filtering zombie users based on classification,automatic extract template to discriminate consumption intent. The following brieflyintroduces main research contents of each of point.(1) Sina micro-blog provides API to obtain the data,we also uses the simulatedlogin webpage to crawl the Sina micro-blog data. But because of sina’s limitation,such as ten thousand times API can be called per hour per IP, crawling webpageafter period of time appear verification code restrictions, this paper finally build theHadoop cloud computing platform, with more than one IP to crawl data. Theremoval of garbage data is also very important to consumption intent recognition,The garbage data in this paper is zombie users. Zombie users publish consumptionintent weibo but they have no commercial value, so this paper filter zombie userfirstly. User authority computed by HITS algorithm, and this paper presents HITSalgorithm to compute user’s value,conpute user’s VF value and use other commonlyfeatures to training model.We manual annotate training set and testing set to trainthe classifier, and at the result we effectively filter the zombie user.(2) Automatic extract template for consumption intent recognition. In this paper,we automatic construct template using Chinese event information extractiontechnology and natural language process technology. We use natural languageprocess technology for word segmentation, part-of-speech tagging, named entityrecognition and dependency syntax analysis. At the same time, this paper definesthree-tuple template containing a trigger word, product name and dependencyparsing. The candidate template is extracted from the training set and ranked by information gain. To generalize the template we use product category library.Through iterative extraction template, we further improve the accuracy and recallrate. But at the same time we also know that the template can be used forconsumption object extraction, which will to be done in the future.(3) Finally we also use the text classification based on SVM and LogisticRegression to recognize consumption intention. And we compare the SVMclassification and Logistic Regression method. We do more detailed work forpreprocessing micro-blog text. Due to the freedom and irregular of the Sinamicro-blog text, which brings new challenges to the traditional text classificationmethod, so we must preprocess the text firstly. then, classify the text.. In order toget reasonable feature dimension and classification model, we also compare thenumber of selected features and the number of training set contained samples.
Keywords/Search Tags:Micro-blog consumption intent, consumption intent mining, consumption intent classification, template extraction, zombie recognition
PDF Full Text Request
Related items