Since the birth of insurance, the insurance fraud was born too. Insurance fraud brings much serious harm to insurance industry; it has become a common problem in the development of insurance in many countries. Faced to more and more forms of insurance fraud, the insurance practitioner can hardly identify them just by their experiences. It is also realized that an insurance company can't found the real intentions of the sly cheater just by its own internal data resource. Insurance urgently need new technologies and the new fraud prevention system participate into the insurance anti-fraud area.The paper designs an Industry-level Insurance Anti-fraud System based on Data Mining. The system is mainly constituted by three business models, including the Repeated Insurance Selection Model, the Insurance Fraud Law Mining Model and the Insurance Fraud Identification Model. Then, the paper designs the logic model and the physical model match to the business models. After giving the holistic system blue print, the biggest difficulties that the Industry-level Insurance Anti-fraud System faces are data pretreatment and Data Mining algorithms, so the paper respectively gives a more detailed design to them. In the design of data pretreatment, based on the characteristics of the source data, the paper gives the Heterogeneous Data Integrate Strategy and the Dirty Data Clean Strategy. In the design of Data Mining arithmetic, based on the business analysis and business model design previous, firstly the paper simulate achieve a centralized association rules mining algorithm (Apriori algorithm) used for insurance fraud law mining; and then the paper design a distributed outlier mining algorithm (distributed Bay algorithm) used for insurance fraud identification.Via the paper and the related study, the following is found: Industry-level Insurance Anti-fraud System and its industry-level data quantity can effectively increase the success rate of insurance anti-fraud; the use of Data Mining makes the insurance anti-fraud more scientific and more guidance; distributed Data Mining should be concerned about the importance of data pretreatment, and the improvements and adjustments from the classical Data Mining algorithm. |