Font Size: a A A

Research And Practice Of Chinese Text Filtering System Based On Internet

Posted on:2005-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y G SunFull Text:PDF
GTID:2168360125951083Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
This paper briefly describes the background of text filtering and systematically discusses the relationship of text filtering and text retrieval, machine learning, etc. Taking the example of a kind of typical Chinese text filtering logic model, it studies the related theory and technology that can realize Chinese text filtering system thoroughly, including concept expansion, Chinese text structure analysis and feature extraction, latent semantic indexing, self-adaptive learning, etc. Then considering the systematic recall, precision, operational efficiency and feasibility, an improved Chinese text filtering system architecture is proposed, the clustering matching modules and the feedback modules of users' interests are added. The approach of the hybrid Chinese text filtering is explained in detail. In addition, the main mathematical models and the relevant algorithms of the system are put forward.The tentative practice to some functions of the whole system has been carried on using Java technology. In practice, the reverse term frequency database is constructed automatically, and the technique of the keywords' weight is improved, and the calculating method of the subject sentences' weight is increased, and the coefficients of the mathematics models are regulated. Furthermore, it has also increased such functions as synonymy expansion and modification, which obtain certain results.Finally, the precision of filtering is not ideal, so the next contents of this subject are summarized systematically and some one' s own views are also presented.
Keywords/Search Tags:Text representation, Concept expansion, Weight, Vector Space Model, Latent semantic indexing, Machine learning, Text filtering, Chinese text
PDF Full Text Request
Related items