Font Size: a A A

Research On Web Acquisition And Automatic Classification Of Massive Text Information

Posted on:2016-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2298330467993153Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the high-speed development of science and technology, overwhelming the users, Internet information presents an exponential growth trend in recent years. It is becoming more and more significant that how to discover, access and use the network text information better. In the era of Big-Data, acquisition and automatic categorization of huge amount of text information is the critical and key technology to obtain, organize and process massive text information. To be used better, excellent acquisition and categorization system can gain relevant web pages from the network efficiently according to given needs, analyze and extract web page information and then categorize the text information in a certain way. Undoubtedly, this is definitely of great help for fast discovery, research and problem solving.Applying the method of word-pool evolution features, this paper research on web acquisition and automatic categorization of massive text information in depth by combining with the web acquisition technology, information processing technology and automatic text categorization technology to effectively acquire and automatic categorize of massive text information of this Big-Data time.Based on the above, the main work completed in this paper can be listed as follows:Firstly, this paper analyzes the key techniques and algorithms in the field of information acquisition and automatic text categorization. Focused on the source code collection technique, link analysis and matching technique and web page processing technique etc. and text representation technique in the field of text categorization, feature selection technique and categorization algorithm technique. Secondly, web acquisition and processing model based on user-defined conditions is proposed. On the basis of the traditional acquisition technology, this model improves acquisition process based on link analysis matching and it also increases the efficiency and accuracy of the acquisition of massive text information.Thirdly, based on feature extraction algorithm of traditional categorization, an improved algorithm on the basis of word-pool evolution features is proposed. With the improved features optimization categorization model, this algorithm increases the size of the feature word set and also improves the accuracy of text automatic categorization.At last, with the implement of the stable and efficient acquisition and categorization system, the web acquisition and categorization model proposed is applied to actual research work. The related algorithm model in this paper are tested and evaluated. The results show that the acquisition and categorization have good effects.
Keywords/Search Tags:text information, web acquisition, word-pool evolution, automatic categorization
PDF Full Text Request
Related items