Font Size: a A A

The Research And Implementation Of The Chinese Text Information Filtering Based On Web Content

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:L T YangFull Text:PDF
GTID:2308330485960459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and information technology, the information resources on the Internet are growing exponentially. A wealth of information resources bring convenience for users to query and use information, but arouse many problems such as information lost, the low-level acquisition about correct information and rubbish information inundation at the same time. It has become an urgent problem for users that how to get the information they wanted from the massive information resources timely and accurately.Web text information filtering is a process that using a certain method to select text information user interested from a large-scale dynamic information flow and to mask off useless information, which is based on the user’s information needs. Web Chinese text information filtering involves the main technology such as the text content’s extraction of Html page, Chinese word segmentation, features extraction and weights calculation, text representation model, the construction of user profiles and text filtering algorithm, etc. In this thesis, we implement a Web text information filtering system model based on the research of Web text information filtering. And the experiment shows that the system model is improved in filtering performance.The main work of this thesis are the following points:(1) Design and implement a multi-level Web text information filtering system model. The filtering method of the system model imitates the process of filtering the information when people read the newspaper. The principle of multi-level filtering method is that when the Web text filtered, firstly, using keyword matching filtering method to filter headings. If the headings isn’t filtered, submit the text to the user directly; otherwise, using the traditional VSM filtering method to filter the text body content.(2) Extract 700 economic texts and 700 noneconomic texts from the special news of Sohu, Sina, Netease and other portals to constitute the testing texts of this thesis. And make the experiment with 100 economic texts and 100 noneconomic texts to determine the threshold for the filtering of the text body.(3) Design and conduct grouping experiments. Comparing the performance of the multi-level filtering method adopted in this thesis and the traditional VSM filtering method under the condition of the same number of filtered text. The experimental results show that compared with the traditional VSM filtering method, the multi-level filtering method adopted in this thesis, which is used in the realization of Web text information filtering system model, does not change significantly in precision, recall and F-measure. But with the increase of the number of filtered texts, the multi-level filtering method adopted in this thesis is better than the traditional VSM filtering method on the completion time.
Keywords/Search Tags:Information filtering, User profiles, Vector Space Model, Multi-level filtering
PDF Full Text Request
Related items