| Data mining is being used more and more widely along with the rapid development of information technology and the Internet. Frequent pattern mining is an important branch of data mining. It is mainly used in finding the frequent patterns from sequences, and has great applications in text indexing, network security, data stream mining, and so on.Mining frequent patterns with wildcards is an improvement on the traditional pattern mining, as it allows wildcards to exist between elements of the frequent patterns. Allowing wildcards makes the mining problem more complex, but the form of the patterns with wildcards is more flexible and more meaningful, and therefore mining frequent patterns with wildcards has not only a theoretical research value, but also a great application potential in text indexing, data stream mining, biological sciences and other areas. This mining seeks to find useful insights for domain users in more flexible models by dealing with the data sets such as web logs, intrusion activities, and supermarket data.There are three contributions in this dissertation:(1) An OneOffMining algorithm is proposed to solve the problem of mining frequent patterns with gaps and the one-off condition. The one-off condition is reasonable both theoretically and improves the mining efficiency in terms of time, compared to existing research efforts on mining frequent patterns with wildcards and gaps.(2) Several experiments on DNA sequences in the biological domain, and also on multi-sequences are given. These experiments demonstrates that mining frequent patterns with gap constraints has a great application potential in text indexing, data stream mining, biological sciences and other areas.(3) A web system for our National Natural Science Foundation of China (NSFC) grant 60828005 is provided. The system provides a platform for other researchers to use our algorithms. |