Font Size: a A A

Research On Web Text Categorization Technology Oriented To Information Service

Posted on:2011-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:F N SuiFull Text:PDF
GTID:2178330338990094Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of information technology has changed user's requirement for more complex and individual resources. It is the main task for information service to search and hunt the needed data exactly and quickly from the mass information internet. The range of information service has been expanding and it is developing into a higher degree. The two main elements that affect the quality of information service are the precision of user's requirement-describing and the effect of data mining. This paper focused on these two elements and researched on the modeling of user's requirement and the technology of text categorization.This paper summarized the key technologies of user's modeling and text categorization, including information filtering, interest describing, word segmenting, text expression, feature selection and exaction, classification algorithm. The differences between two word segmenting methods (segment based on statics and segment based on rules) are discussed. The impact of IG, CHI, MI, LSI, etc, is studied and their differences are deeply discussed. Na?ve bayes, KNN and SVM text categorization algorithms are discussed while their superiority and disadvantage are widely studied. The theory of creating computer corpus and the actuality of Chinese corpus is briefly introduced.Based on the analyzing of the traditional feature exacting methods, a new model for feature selection is proposed. By generating absolute correlation and the method of clearing disturbing features, this algorithm can effetely remove the disturbing features created by IM; The foreign method that generates features based on knowledge warehouse is used for reference, this paper took a try to use Chinese knowledge to teacher the process of feature selection; Analyzing the links between web pages, a new algorithm called term-rank is proposed based on the famous search engine ranking method page rank. A comparison is made between the traditional feature selection method and term-rank by the widely used classifier SVM.This paper deeply researched the technology of user describing and model creating for information service. A suit of method for user describing and model creating and updating is proposed. At last this paper studied the principle of Chinese classifiers especially the neural network classifier. By combining the min-max modular theory and neural network, a new classifier is tested used the emulator of Matlab.
Keywords/Search Tags:Information Service, Feature Selection, User Modeling, Text Categorization, Neural Network
PDF Full Text Request
Related items