Font Size: a A A

Research On Mining Algorithms Of Web Data Stream For Uwer Interest Drift

Posted on:2012-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:C H XuFull Text:PDF
GTID:2178330332983125Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the significant increase in the amount of data on the internet, people's requirements of the real-time, accuracy of information discovery improve continuously, thus birth the development of web data stream mining and make it a hot topic in the field of artificial intelligence. Simultaneously, people's requirements of data become more and more specialized, and the appearance of personalization technologies relieve the contradiction between the diversification of the information on the internet and the specialization of users'requirements in some way. As the core issue of personalized service, user modeling technology has received wide attention. So web data stream mining algorithm oriented user interest drift mentioned in this text has high theoretical and practical significance.On the basis of study home and abroad, studies web data stream mining algorithm oriented user interest drift, and applies in personalized recommendation system.The main contents are as follows:First, specific to the shortages of current data stream association rule mining method, through the research of association rule mining maximal frequent itemsets, this text proposes a web data stream algorithm for mining maximal frequent itemsets(A-MFI) based on the ordered composite strategy of self-adjusting. This algorithm uses sliding window technology, innovatively proposes self-adjusting orderly-compound FP-tree strategy.Second, currently, there are two classes of study of clustering data streams home and abroad:take data streams as cluster object and take the data in data stream as cluster object. Specific to the shortages, this text proposes a new, parallel web data stream clustering algorithm—JPStream, taking data stream itself as study object. The algorithm uses damped window model, applies principal component analysis to reduce dimension of data stream, clusters by studying the coincidence between the characteristics of data stream, designs dynamic adaptive DK-means clustering algorithm.Third, based on the in-depth study of characteristics of user interest and user interest drift treatment, this text proposes a comprehensive algorithm to deal with users interested in short-term drift and interested in long-term drift. For users interested in short-term drift phenomenon, use time window method, propose a double-window adaptive model, by dynamically adjusting the window size, effectively deal with changes in the user's short-term interest; For users interested in long-term drift phenomenon, propose normal forgetting function that meets actual situation more, dealing with changes in long-term interest.Fourth, use several sets of data to test the algorithm proposed in this text, also use the Movielens data stream of film scoring to comprehensively test and analyse the algorithm model. The results show that web data stream algorithm for mining maximal frequent itemsets, parallel web data stream clustering algorithm and user interest drift handling algorithm are better, and are effective in the disposition and application of personalized recommendation system, have higher accuracy, provide important reference program for the design of personalized recommendation system of internet companies and strong support for their further business analysis, CRM and so on.
Keywords/Search Tags:web data stream mining, orderly-compound, self-adjusting, coincidence, user interest drift
PDF Full Text Request
Related items