Research Of Network Information Collection And Intelligent Processing Technology

Posted on:2013-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:L N Zou

Full Text:PDF

GTID:2248330371981317

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Whether the scientific research or study we all need to find the latest professional information and news and trends through the Internet, but the explosion of information also make people get information more and more difficult in the ocean of information. On the one hand, the information on the Internet increases everyday and updates quickly, this requires a lot of time for information search; On the other hand, there are large repeat information on the Internet, and the format of information is not standard, that increase the difficulty of searching information for users. So the technology for network information collection and intelligent processing arises at the historic moment.Users can search a large number of information through search engine, but without information extraction, organization and processing. Along with the progress of information, information search has improved from "general" into "personality and intelligence" as the users demand more and more on acquisition of information. On the market at present there have been a lot of information collection tools that can satisfy the needs of information acquisition for users to certain extent, but for information processing is poor. Due to the text information accounts for a large part of the Internet, how to automatically classify the text information in Internet becomes the key technology of information processing.First, this paper introduces the web crawler and analyzes the principle of web information acquisition, duplicated webpage deletion and the method of information extraction based on the analysis of the existing information collection and information processing technology. And does a in-deep research to the key technology of text classification for intelligent information processing, improved the existing feature selection method and the text classification algorithm. With the improved KNN algorithm constructs a automatic text classifier, take the sogou corpus as the training corpus in classification model, then trained out the best K value and the characteristic dimension for this corpus through the experiments, and verified it has better effect of classifying by improved KNN algorithm.The innovations of this paper are as follow:(1) The method of feature selection in text information processing is improved in this paper, proposes the thought of synonyms merger by introducing the TongYiCi CiLin, replace and calculate the synonyms before feature selection, so as to reduce the dimension of feature space.(2) An improved KNN algorithm has been presented in this paper. By use of the clustering center vector, we put the distance of the under classified text and the category of text into the similarity calculation formula, and take the ratio of the number of common features appear in two texts and the maximum number of respective features of two texts as the adjust factor in the formula.(3) Constructs a automatic text classifier with the improved KNN algorithm, the connection between the under classified text and the category could be a prior consideration in classification stage, when the relationship between the two is ambiguous, comparing with all training texts, determine the category of the under classified text according to the result of the comparison.

Keywords/Search Tags:

Network information collection, KNN algorithm, Feature selection, Vectorspace model, Text classification

PDF Full Text Request

Related items

1	The Research And Implementation Of Chinese Text Classification Based On Feature Selection And LDA
2	Research And Improvement Of Feature Selection Algorithm In Text Classification
3	Research And Improvement Of Feature Selection Algorithm In Chinese Text Classification
4	Research On Text Classification Method Based On Improved Feature Selection Algorithm
5	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
6	Improvement On Feature Selection And Classification Algorithm For Text Classification
7	Research On Feature Selection Algorithm For Text
8	Research On Feature Selection Of Text Classification
9	A Text Automatic Classification System Of Class-Based Feature Selection Algorithm
10	Research On Text Classification Based On Feature Selection And Feature Weighting Algorithm