Font Size: a A A

The Research Of Multiple Pattern Matching Algorithm On Chinese/English Mixed Texts And GPU Parallelization

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2268330425484181Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer, communication and network, informationleakage and security have been more and more concerned. Content audit of networkinformation can ensure that the information will not be leaked and prevent theillegal information effectively. The key technology for content audit is multiplepattern matching. In the network environment of China, multiple pattern matchingwill face the Chinese/English mixed particular problem. Traditional multiplepattern matching in this environment will cause the expansion of space,mismatching or leakage matching. With the increasing of the network data scale,real-time content audit have higher requirements. Therefore, we proposed somenew protection methods in this research work.(1) With the Trie structure a matching algorithm based on the node addedmixed multiple pattern matching is proposed. By adding a small amount of nodes,the algorithm avoids stagger matching for the Chinese first byte. The algorithm canprocess the pattern string which has containing both Chinese and Englishcharacters properly, and avoids mismatching effectively. It prevents the occurrenceof false matching and simplifies the process of matching by the elimination ofbranches statement to make the algorithm be paralleled easier. Based on storing thematchings number of each state, one optimization algorithm is proposed. Thealgorithm reduces the time cost of pattern matching, and improves the matchingefficiency.(2) One GPU parallel optimization algorithm based on splitting the text fortexts matchings is proposed. By pretreatment of text data, this algorithm splits thetext, and improves matching efficiency by parallelization. A common text matchingplatform based on GPU is proposed. This platform provides a unified functioninterface. Researchers can complete the multi-pattern matching algorithm forparallel optimization by make their core code embedded into the interface functionssimply. The platform simplifies the process of coding, and improves the efficiencyof development.
Keywords/Search Tags:content audit, multiple pattern matching, Chinese/English mixed, graphics processing unit (GPU), parallel computing, computer unified devicearchitecture (CUDA)
PDF Full Text Request
Related items