Font Size: a A A

Research And Application Of Key Technology On Web Text Classification

Posted on:2016-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:L P ChenFull Text:PDF
GTID:2348330488474567Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of Net technology and the popularity of Internet, our life and work are filled with a lot of information, and the productive of information is up to an unprecedented level. However, most of those data produced by web-based are unstructured or semi-structured, so, the way that those data are organized and processed is a big challenge for us. By using the technology of Web text classification, we can gain a higher way to organize and classify those web texts, it will improve the speed and accuracy of information searching. At the same time, Web text classification is a part of the technology of Web data mining, the improvement of Web text classification makes it possible that people can find more unknown and valuable information from web.The subject investigated by this paper is Web text data, we make a deep analyzation and search on those questions and technologies related to Web text classification. Meanwhile, a description on the background of Web text classification is made, and the current situation on this area is instructed too. Based on the works of above, we also describe some functions to evaluate the effect of those theories, so as to provide a theoretical support for subsequent operations. The work that performs by this paper can be concluded as bellow:Firstly, we analyze the measures of term abstract at the steps of text term abstract, and make a research on the relation of different text terms. Then, we propose a new method to abstract the web text terms, this method is based on the relation of different text terms. And an experiment is performed by us to check the use of this way. At the same time, we are inspired by the advantage of the term abstract based on information gain(IG) and CHI measure, and a further research is made too, then we propose a new method to abstract text terms that called IG-CHI method. Besides the traditional feature extraction method and the new method are compared in terms of efficiency and accuracy.Secondly, we find that the Model of Web text classification based on single classification machine can't satisfied the need of practical work by comparing with the current text classification model. However, the model that based on multi text classification machine has an efficient improvement on efficiency and accuracy. So, we propose a Web text classification model based on Multi-layer Ensemble Learning.Finally, on the basis of the research above all, a web classification system is designed by us. By using this system, we can request Web text data and to classify it. And all of evidences show that the system has higher efficiency and accuracy.
Keywords/Search Tags:Web, Text Classification, Term Abstract, Text Mining, Term Relation
PDF Full Text Request
Related items