Font Size: a A A

Research On Scheme Of Topic-Specific Web Pages Filtering

Posted on:2008-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2178360215457162Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the wake of more popularization and swift development of Internet, the manners of people's querying information have been greatly changed and Internet has played more and more important part in our life. However, some features of Internet such as openness, equality, unboundness and etc. have also brought about the non-restricted abuse of the network: A lot of information noise and sensitive information, which can decrease the density of the useful information, flood it. Therefore how to filter these unwanted messages and eliminate negative influence has become one of the key questions in the field of Internet information service. Fortunately via information filtering, the most effective method, people can solve the problem in effect. In order to facilitate the filtering, recently techniques of machine learning have been applied to classify documents automatically in many researches.Based on the research into general theory of information filtering and common technologies of web pages filtering, from the point of function, a topic-specific web pages filtering architecture is brought forward and constructed in this thesis, in which details have been deeply studied as well. The main work and creative results are as follows:Firstly, analyses different information streams currently transmitted through the Internet and classifies them according to filtering requirements; then definitely defines the concerned topic-specific domains. Moreover, designs a topic-specific information filtering system(TSIFS), which adopts a layered filtering strategy and introduces Neural Network categorization into the classified scheme of information filtering. The learning capacity and adaptability of Neural Network categorization can cover the shortage of filtering, so the veracity of filtering will be increased.Secondly, multiple types of data contained in web pages are transformed into text formatting to predigest the disposition. During this process the filtering features of topic information is considered, vectorization of text with focused vocabulary is accomplished, classification efficiency degradation is put forward and a new weighting function of key words is designed.Thirdly, an information classifying model based on Back Propagation Network is constructed and the scheme of filtering engine including normalization of input-vector and selection of network parameters, etc. is also discussed.Finally, emulation experiment of the proposed topic-specific filtering architecture is given and analyzed to prove out its feasibility, efficiency and veracity.
Keywords/Search Tags:Web Pages, Topic Information, Filtering, Neural Network
PDF Full Text Request
Related items