Font Size: a A A

A Sequence Clustering Algorithm Based On Edit Distance And Its Application In Clinical Anomaly Detection

Posted on:2018-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q H SunFull Text:PDF
GTID:2334330533959278Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Now the area and the target customers of medical insurance is increasing in a rapid speed,however,medical fraud and illegal behaviors are more and more common at the same time.Since the medical field is a high professional field and there is a information asymmetry between the three participants of the medical insurance transactions,the abnormal behaviors can be easily concealed for medical insurance institutions.So studying the method to detect the clinical abnormal behaviors has significant theoretical and practical value for standardizing medical order and preventing medical fraud.Combing the domestic and foreign research results,this article analyses the scheduling and natural cohesion of clinical behaviors,uses Bisecting K-Means algorithm to cluster the data set of normal clinical medical behaviors,uses the resulting clusters as a profile of normal clinical behaviors,and studies the abnormal detection technology which aims at the characteristics of abnormal clinical medical behaviors based on distance.Finally I get the archetype system of clinical anomaly detection to detect the potential abnormal behaviors in the clinical data.The main works of this article are as follows:(1)Propose the C lustering Bisecting K-Means algorithm based on the overall similarity match(PSC lu).PSC lu optimizes the method that Bisecting K-Means algorithm uses to calculate the distance and uses edit distance as a function to measure inter cluster sequence similarity.By combining the calculation of the upper and lower founds of edit distance,the similarity of prefix subsequence and the approximate solution of centroid of cluster,I filter out some calculation of edit distance and lower the time complexity of Bisecting K-Means algorithm,and finally generate normal clinical sequence clusters in a rapid speed.(2)Propose a method to calculate the similarity of the sequence to centroid of cluster.In order to find abnormal clinical behavior sequences more effectively,I study the comparison of the drug usage and price similarity based on the similarity of the drug effect.According to the difference of the significance of medical behaviors,I use the weighted edit distance algorithm(WED)to calculate the similarity between the centroid of cluster and the sequences to be tested.(3)Constructing anomaly detection model.Use PSC lu algorithm to generate normal clusters and use these clusters as normal profiles;introduce WED algorithm to calculate the similarity between the sequence to be tested and the centroid of cluster,regard the degree of difference as a reference to check whether there are abnormal behaviors,and then construct the anomaly detection model which includes mechanism of data preprocessing,clustering and similarity judgment.(4)Design and implement the archetype system and implement the page,service layer and data persistence layer of the archetype system based on anomaly detection model.The performance of the anomaly detection system is analyzed and evaluated through the clinical behavior data of a medical institution.
Keywords/Search Tags:Clinical behavior sequence, Sequence clustering, Similarity measurement, Anomaly detection
PDF Full Text Request
Related items