Font Size: a A A

Research Of Text Mining Technologies For Interests Of Micro-Blog Users

Posted on:2016-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2298330467977353Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Micro-Blog, one of the most popular social networking platforms, plays an increasingly important role on information transmission and user communication. Users can follow and share information, access to various network resources, and express their own personal opinions through computers or mobile terminals. Obviously, micro-log has become a platform for collecting, sharing, communicating and disseminating of real-time information. Furthermore, the internet produces tens of billions of micro-blog data which hide immense commercial value. Therefore, it is important to analyze users’interests for discovering micro-blog’s value and promoting user experience (UE/UX).This thesis crawls massive micro-blog data based on simulating browsers’behaviors. By using several Natural Language Processing approaches such as Chinese Word Segmentation, Classification and Keyword Extraction, micro-blog data is well processed to analyze and discover users’ interests. The main research works are listed as follows.First, HttpWatch9.1is used to capture and analyze Web data stream. Moreover, the massive data is collected by simulating browsers’ behaviors and cleaned with pattern rules in an automatic manner.Second, in consideration of the characteristics of micro-blog contents, a user exclusive IDF (Inverse Document Frequency) dictionary is proposed which combines the BaselDF dictionary with the dynamic joint-interest-based IDF dictionary. Then, the users’ interested keywords are extracted based on users’micro-blog data with the modified TF-IDF algorithm.Third, the definition of General Zombie User (GZU) is given in this thesis. Then a classification model based on AdaBoost.Ml algorithm is used to clean the micro-blogging relationship of the target users. Meanwhile, a new algorithm named RelationRank is proposed to rank the related users based on the original PageRank algorithm. At last, the selected users’ micro-blogs are used to describe the interests of the target users.Last, this thesis designs and implements an interest mining platform for micro-blog users which adopts a modularized hierarchical design. Then a contrast experiment research is made to evaluate the validity and the accuracy of the platform.
Keywords/Search Tags:Micro-blog data collection, IDF dictionary, General zombie user recognition, RelationRank algorithm, Users’ interest mining
PDF Full Text Request
Related items