Research On Web Acquisition And Automatic Classification Of Massive Text Information

Posted on:2016-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:R Li

Full Text:PDF

GTID:2298330467993153

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the high-speed development of science and technology, overwhelming the users, Internet information presents an exponential growth trend in recent years. It is becoming more and more significant that how to discover, access and use the network text information better. In the era of Big-Data, acquisition and automatic categorization of huge amount of text information is the critical and key technology to obtain, organize and process massive text information. To be used better, excellent acquisition and categorization system can gain relevant web pages from the network efficiently according to given needs, analyze and extract web page information and then categorize the text information in a certain way. Undoubtedly, this is definitely of great help for fast discovery, research and problem solving.Applying the method of word-pool evolution features, this paper research on web acquisition and automatic categorization of massive text information in depth by combining with the web acquisition technology, information processing technology and automatic text categorization technology to effectively acquire and automatic categorize of massive text information of this Big-Data time.Based on the above, the main work completed in this paper can be listed as follows:Firstly, this paper analyzes the key techniques and algorithms in the field of information acquisition and automatic text categorization. Focused on the source code collection technique, link analysis and matching technique and web page processing technique etc. and text representation technique in the field of text categorization, feature selection technique and categorization algorithm technique. Secondly, web acquisition and processing model based on user-defined conditions is proposed. On the basis of the traditional acquisition technology, this model improves acquisition process based on link analysis matching and it also increases the efficiency and accuracy of the acquisition of massive text information.Thirdly, based on feature extraction algorithm of traditional categorization, an improved algorithm on the basis of word-pool evolution features is proposed. With the improved features optimization categorization model, this algorithm increases the size of the feature word set and also improves the accuracy of text automatic categorization.At last, with the implement of the stable and efficient acquisition and categorization system, the web acquisition and categorization model proposed is applied to actual research work. The related algorithm model in this paper are tested and evaluated. The results show that the acquisition and categorization have good effects.

Keywords/Search Tags:

text information, web acquisition, word-pool evolution, automatic categorization

PDF Full Text Request

Related items

1	Research Of Automatic Categorization System For Chinese Text About Complaining Information
2	Research And Implementation Of The Automatic Chinese Text Categorization
3	Word Frequency Extraction And Automatic Text Classification Methods In The Digital Library
4	Research Of Chinese Text Categorization Algorithms Based On Information Entropy
5	Research And Implementation Of Text Categorization System Based On VSM
6	The Design And Implementation Of Text Data Acquisition System Focused On News Field
7	Research On Chinese Text Categorization Algorithms Based On Technology Text
8	Design And Implementation Of Web Automatic Text Categorization
9	Multi-class Scientific Literature Automatic Categorization System
10	Chinese Text Data Classification