Font Size: a A A

Research Of Sequential Patterns Mining Algorithm Based On Web Logs

Posted on:2011-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:X X WangFull Text:PDF
GTID:2178360305489528Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of the information age, people increasingly rely on a variety of information on the internet, while the accuracy of the information search technology demands increasingly high. However, in fact, the amount of the information on the internet is showing the explosive growth, and it also includes a kind of false, irrelevant information waste which virtually to the user's search results in interference. So how in the effective period of time to find information the user really need has become the hot issue of web log mining research field currently. As one of web log mining technologies sequential pattern mining which is the more important, is increasingly concerned about the scholars.In recent years, for the search technology many scholars have designed more efficient sequential pattern mining algorithm which more meet the needs of users. Sequential pattern mining technology has broad practical applications, it can be used in the data with the sequence characteristics, to find potential patterns that meet user needs.Business users make some improvements on strategies or structure by analysing these patterns in order to meet their different purpose,for example, to improve service quality of the web sites or provide personalized service. For example, sequential pattern mining is used for user access pattern mining in the commercial field. Supermarkets managers use this technology for user purchase behavior prediction. Biologists use it to find DNA sequences and so on. The research of sequential pattern mining techniques is very importantly meaningful.In this paper, the main research work is to use the storage strategy of the SPADE algorithm in the Apriori algorithm to simplify the connection and testing processes. At the same time in order to increase the efficiency of the algorithm and make results more to meet user needs, we add the time constraints that reflect the user needs to the Apriori. An outstanding problem of the Apriori algorithm is that:it scans the database many times, has a large search space and produces a large number of candidate sets.Therefore, this article improves the Apriori algorithm by adding time constraints on it to reduce the search space and the number of candidate sets and to meet customer demand. At the same time in order to reduce the memory it would occupy by the process of implementation, this paper presents a five-tuple storage strategy which also simplifies the search process. While the added time constraints will increase the complexity of the algorithm, but the overall efficiency of the improved algorithm is still raised. The paper considers fully the efficiency of the improved algorithm MSPVF which also has better accuracy and recall rate. After learning from the other algorithms which were improved by time constraints, this paper gives the improved algorithm MSPVF and achieves certain mining effects.
Keywords/Search Tags:Sequential pattern mining, Apriori-like algorithm, time constraints, user needs
PDF Full Text Request
Related items