Font Size: a A A

Research And Application On Web Crawling And Text Mining Technology

Posted on:2015-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:S D DiaoFull Text:PDF
GTID:2298330467963768Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Today, the development of the Internet we can see is very quickly. In the very rapid development of technology, the video, audio, text data accumulated sharply, forming a very impressive set of data. Wherein the data to be stored as a text representing a substantial proportion. A large number of data accumulation and the rise of cloud computing and other technologies also make people aware of the face of the multitude of huge amounts of data, we can use various means to knowledge mining. In this paper, the main cost of energy for Text Mining WEB crawling text and text classification technology research and applied technology for the actual system. Use in text sentiment analysis module of the Naive Bayesian method for the initial classification, the positive and negative polarity in the second category, using a method of emotion dictionary. The project is based on HowNet semantic lexicon based on their own network manually add new words, emoticons for social areas in the field of dictionaries. Specific results are as follows:1Build a free landing for a social networking and microblogging efficient multi-threaded crawler system.2constructed WEB state analysis of the text (mainly Sina Weibo and Renren based) emotions and tendencies judging system. Initial classification using Bayesian algorithms, secondary classification using dictionary-based segments of emotions.3Based on the Chinese HowNet in the dictionary file,I constructed specifically for the expansion of the field of social media and microblogging class field emotion dictionary. In actual use, improve the recall rate in the field of short text classification.4Finally also introduced some work during my own postgraduate when I do my fieldwork.
Keywords/Search Tags:text categorization, text crawling, emotional dictionary, textmining, Bayesian classifier
PDF Full Text Request
Related items