Font Size: a A A

Support Vector Machine Based Mining Software For Blogger’s Information

Posted on:2013-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:M P LuFull Text:PDF
GTID:2248330374975962Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Blog mining, which is a specific data mining application, has received widespreadattention in recent years. Current research focuses on the views mining, community detection,blog search rather than interest mining and personality mining. Nowadays, personalizedservice is emphasized, accessing to the user’s personal information, such as interest andpersonality, is bound to be able to provide decision support for personalized service.Therefore, mining blogger’s personal information becomes increasingly important.The research purpose of this paper is to mine the blogger’s personal information, mainlyidentify the blogger’s interest and personality so that service providers can understand thethings of interest of the users, and know how to communicate with the users, achieve win-winbetween users and service providers.This study is mainly based on natural language processing, text classification andmachine learning techniques, conducted from three aspects: the classification of blog posttopics, blogger’s interests and personalities. The blogger’s interest collection and personalitytype can be recognized by analyzing all the blog posts of blogger. The prototype of blogger’sinterest and personality classification system is implemented ultimately. The main innovationsinclude:(1) a method based on the statistics of blog tags is proposed to recognize new wordsand common phrases, and an easy method is proposed to get rid of noisy posts in order toensure the reliability of the post samples. Combined with the structural characteristics of blogcontent, which consist of title, tag, label, first paragraph, last paragraph and other part, G1method proposed in comprehensive evaluation model is used to calculate the weights of thesefeatures, and better results have been obtained in the blog post subject classification.(2)Compared with the existing studies, less human intervention is needed to mine blogger’sinterest, such as collecting data, performance prediction. It greatly reduces the human cost tolabel the sample. In addition, the paper introduces a new evaluation criteria-the intersectionof non-empty. By expanding the size of predicted interest set slightly, non-empty intersectionrate can be significantly improved in the prediction of interest. The accuracy has reached to77.0%when the size of interest set size is two.(3) This paper has studied the classification of extraverts and introverts of the Net-Ease bloggers on the basis of the standard big fivepersonality, mainly focused on the features of personality classification and feature selectionmethods. Our method can significantly improve the classification effects of blogger’spersonality. The best accuracy rate has reached to77.7%,25.8%and16.4%more thanbaseline and information gain method respectively. The method can be taken as a referencefor other personality classification of Chinese blogger.
Keywords/Search Tags:Machine Learning, Blog Mining, Text Classification, Subject Classification, Interest Classification, Personality Classification
PDF Full Text Request
Related items