Font Size: a A A

Research And Application Of Micro-blog Acquisition Method Based On Feature Words

Posted on:2018-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:D J ZhangFull Text:PDF
GTID:2348330518998523Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the accelerated pace of development of the Internet, the way information is more frequent and diverse. The emergence of microblogging to our way of life and thinking habits have brought a subtle influence. As the content of microblogging content is strong, in various forms and large amount of information, so in a large number of daily updates in front of how to more accurately get the content of interest to become the focus of people's attention. At present, most of the research on microblogging acquisition methods focus on microblogging friends and social networking, and microblogging content as the focus of the relevant research less. The official push of the way to obtain too much doping of commercial content, push effect is often unsatisfactory.How to effectively obtain the content of the user concerned, pushed to the user-related microblogging, as the focus of this study.In this dissertation, the TF-IDF optimization algorithm based on the combination of part of speech and word length is put forward into the word2vec text vector representation to improve the accuracy of classification and research and development of microblogging based on feature words system. The main contents of this dissertation are as follows:1) In this dissertation, we propose a TF-IDF optimization algorithm with word length and part-of-speech factor combined with the original algorithm to improve the accuracy of the choice of feature words. In this dissertation, we propose a TF-IDF algorithm based on the traditional TF-IDF algorithm. degree.2) word2vec tool training derived from the semantic representation of the vector, but can not distinguish between the importance of vocabulary in the text, to this end, this dissertation introduces the TF-IDF optimization algorithm on the original word2vec word vector weighting, weighting word2vec Word selection method to improve the accuracy of semantic representation of feature words selected in short text.3) Based on the study of the above-mentioned feature word selection algorithm, this dissertation designs and implements the microblogging acquisition system based on the classification algorithm, and gives the main models and implementation process in the system analysis and summary design process. The system to achieve microblogging release and access to users interested in microblogging function.In this dissertation, we first perform the algorithm validation and performance verification experiments on the microblogging data set, and compare the traditional algorithm and the improved scheme.Experiments show that the proposed scheme can improve the classification accuracy of short text effectively. At the same time, the design and implementation of microblogging acquisition system for the actual function of the algorithm application, the application shows that the optimization algorithm proposed in this application is effective and feasible.
Keywords/Search Tags:feature selection, word vector, classification algorithm, microblogging acquisition
PDF Full Text Request
Related items