Font Size: a A A

G2C-oriented Cross-platform Web Text Mining Models And Methods Study

Posted on:2010-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y KuangFull Text:PDF
GTID:2178330338482218Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, government departments at all levels has established a G2C platform for E-government, which has strengthened the interaction between government and people. The masses of information implicit in a large number of signs and clues of upcoming great events, so the web text mining technology used in e-government G2C platform can get a further collection and analysis of relevant information and dig out the clues to important events, which has great significance to prevent the occurrence of important events.This essay focuses on its feature extraction of people's message, its text classification and the term weights of featured words and sensitive thesaurus. The main works of G2C E-government platform are:Firstly, it introduces the design train of GWTMS of G2C E-government based on analysis of people's message, and then presents the system architecture of GWTMS. It's key point is to design a G2C text mining model based on web text mining, which can be divided into five parts, namely, the web text pre-processing module, the text automatic classification module, the text messaging layer processing module, the statistical analysis module, and the performance analysis module.Then, it raises a new text feature weighting method—TF ? IDF ? Ci method, based on traditional TF ? IDF feature weighting method. This approach has improved the original one, increased weights of reflecting the distinction between class and class and enhanced ability to distinguish between class and class. Our experiments show that the improvement of TF ? IDF ? Ci-weighted method not only can significantly improve classification accuracy, but also can reduce the sensitivity of the characteristic dimension to some extent, which is particularly useful for those sensitive to the characteristic dimension of the classifier.Finally, it has designed and implemented an algorithm for text classification, that is, the original text can give a level of importance through the feature extraction, classification, wishes to, thus enabling the system to automatically send to the appropriate department or leadership for processing, which is a G2C text mining post-processing of the third component. In them, the former two segments, that is, sensitive thesaurus design and entry weight division are the basis for classification level. That's to say, extraction of the entries and lexical entries in sensitive thesaurus where the weight settings, can only be able to get through total weight, which is the foundation of last division. Experimental results show correctness and feasibility of classification level algorithm design.
Keywords/Search Tags:E-government, G2C platform, Web text mining, mining model, feature weighting method, classification level
PDF Full Text Request
Related items