Font Size: a A A

The Web Application Of Text Mining Technology In The Web Page Recommendation

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2248330395483105Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As the rapid development of science technology, Web has been the world’s largest public data sources. On these pages, the distribution of information is dispersed and they have no fixed structure. Thus, facing such large and complex information, it is more difficult for people to get useful knowledge. Meanwhile, it is also difficult to meet the desire of user’s interests in knowledge. Therefore, the recommendation system came into being. It recommends interesting things to users according to the user’s interests or browse information.In this paper, the main content is to research on clustering of text mining technology in Web. First of all, proposing a recommended model based on former researches. Then introducing extraction of main content from Web, the clustering of Web text and an improved recommendation algorithm according to the process of the model. Recommendation algorithm is the core of the model, extraction of content is input of text clustering algorithm and recommendation algorithm, and text clustering is the preprocessing of recommendation algorithm. In this paper, the contents are as follows:1, The model based on classic recommended algorithm. Designing a model for text recommendation in Web combining with the characteristics of Web text.2, Researches on extraction of Web page structure, and how to find the main content of the page from the structured tree. Total learning database of recommended content can be got through the Web Spider, while taking advantage of the depth-first algorithm to build a DOM tree. Pruning techniques can remove unwanted nodes, and finally extracting the main content of the page.3, Researches on Web text clustering. In the view of shortcoming of common distance, such as Euclidean distance, proposing (approximate EMD. Using approximate EMD distance instead of the common distance to make results more accurate between entities. Experimental data are from the Chinese Academy of Sciences Institute of Computing Technology. The average accuracy rate of DBSCAN algorithm based on Euclidean distance is78.9%, while the average accuracy rate of improved algorithm based on EMD is84.4%. It shows that in view of accuracy rate it is feasible to use EMD to cluster data instead of the common distance.4, Researches on the recommendation algorithm. Algorithm based on the contents of pages only concerns with the content itself, which leading recommended object to be remained unchanged. Therefore, taking advantage of the collaborative filtering algorithm, combining with users’ comments, it will form preference prediction of the given user. While using weighted items to get step-changed weighted rules of target. The combination of these two methods can more easily find users’interests to form a recommendation list to give users more interesting things.
Keywords/Search Tags:Web text mining, Website recommendation, DOM tree, text clustering, approximate EMD, collaborative filtering, weighted association rules
PDF Full Text Request
Related items