Font Size: a A A

Research And Implementation Of Cycle Matching Algorithms Based On Machine Learning

Posted on:2014-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:P P LiuFull Text:PDF
GTID:2348330473453895Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and delivery service, online exchange service has become more and more convenient and popular. Publish/subscribe system, as an important application of exchange service, has changed people's way to process information. In homogeneous symmetric Pub/Sub applications, both subscribers and publishers could give restriction about the subscription conditions and the publish conditions to decide whether they should agree with the matching results. The more exchanges traded successfully, the more profit users and the system could get. Besides one-to-one swap, an exchange can occur among more than two subscriptions, and such exchange is a cycle matching. Cycle matching not only improves the matching probability of a subscription, but also improves the profit of the system for more exchanges could be traded successfully. However, as the number of subscriptions increases, or as the number of the maximum length of the cycle matching increases, the number of cycle matchings and intermediate results increases exponentially, which brings a big problem of storage space. Approximate cycle matching strategy solves this problem by reserving intermediate links selectively to search more cycle matchings with limited space. Besides, how to recommend better cycle matchings with higher match degree quickly is becoming a key procedure in Pub/Sub system because the candidate cycle matching set becomes larger and larger as the number of subscriptions increases.Approximate cycle matching algorithm based on threshold prunes the intermediate link results with lower matching probability to save space. However, online exchange service is a kind of real-time system, and the data distribution of subscriptions will change as time goes on. The threshold strategy should set the threshold frequently to accommodate the changing data distribution, and it doesn't take the discretization about the matching probabilities respect to two kinds of predicates into account. So there could be such circumstance that the own predicate could be matched easily while the want predicate nearly couldn't be matched. To solve this problem, an approximate cycle matching algorithm based on dynamic-classified ELM(Extreme Learning Machine) is proposed in this work. The algorithm based on dynamic-classified ELM accommodates the data distribution by dynamic training work instead of setting threshold manually, and also takes the discretization into account. The proposed approximate cycle matching algorithm based on dynamic-classified ELM is compared with other three strategies in this work, and the result shows that the approximate cycle matching algorithm based on dynamic-classified ELM could always find the best threshold, get a higher classification precision, and compared with the threshold strategy, it costs the cycle matching time in the same order.Traditional top-k strategy filters all the cycle matchings to find the best k cycle matchings with highest score calculated by a score function based on match degree. Nevertheless, top-k strategy will be infeasible when the best k cycle matchings are very similar, for users may expect more kinds of results. On the other hand, traditional top-k strategy pruns data with lower score than the minimum value of the heap, so it is non-persistent, and users will issue a new top-k query to get more kinds of results. To solve this problem, this work presents a top-k clustering strategy to improve the diversification and avoid the imperfection of top-k strategy mentioned above. Three top-k clustering algorithms are proposed, top-k group query algorithm, top-k clustering algorithm based on histogram, and algorithm based on extended DBSCAN. The top-k clustering strategy are evaluated in a simulated environment and compared with traditional strategies. The result shows that top-k clustering strategy could provide best cycle matchings with more diversification, and the algorithm based on histogram and extended DBSCAN cost time shorter than the cycle matching process, and it needn't cost extra time caused by repeated top-k queries, so the strategy is feasible.
Keywords/Search Tags:publish/subscribe system, cycle matching, Extreme Learning Machine(ELM), top-k clustering
PDF Full Text Request
Related items