Font Size: a A A

Design And Implementation Of Web Classification And Stored Query System

Posted on:2011-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:P P HanFull Text:PDF
GTID:2178360302994938Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Along with rapid development of network and information technology, Web pages on the Internet were exponential growth, how to organize and deal with these vast amounts of information effectively, and how to search, filter and manage these resources better; these have become an urgent problem. Through web page classification, we can establish web page category database which may effectively organize and manage network resources, and enhance the efficiency of retrieval information. In addition, web page categorization technology can be applied in information filtering. For example, the preservation of the URL classification can be used for URL filtering system, web page classification model can be used in content filtering. Thus, the study how to efficiently and accurately classify web page, preserve the classification result permanently will be of great significance.Firstly, on the basis of the analysis of the system requirements, we designed the overall structure of the system. Then we discussed the techniques and methods of each step of the working flow in detail, mainly including text representation models, the Chinese word segmentation algorithms and feature extraction algorithms.Then, we make some analysis comparison of several common used feature extraction algorithms. As for the requirements of the permanent preservation of the web page classification result, we proposed incremental storage and feedback queries strategy, effectively saving storage space. At the same the feedback queries strategy can make up the limitations of the web page collection. In view of the URL standardization of the process of store and query, we apply a new URL parsing method which is based on Nested FSM, improving the parsing efficiency and fault-tolerance performance. Finally, on the basis of the study of the web page classification and store technology, we proposed the design and the implement method of web page categorization and storage management system. Then we tested the important performance of system including information extraction, feature extraction algorithms, Weight calculation algorithm and store query function. The test result has achieved the system design aim.
Keywords/Search Tags:Web Page Classification, Information Extraction, Word Segmentation, Feature Extraction, Storage and Query
PDF Full Text Request
Related items