A Sequence Clustering Algorithm Based On Edit Distance And Its Application In Clinical Anomaly Detection

Posted on:2018-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:Q H Sun

Full Text:PDF

GTID:2334330533959278

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Now the area and the target customers of medical insurance is increasing in a rapid speed,however,medical fraud and illegal behaviors are more and more common at the same time.Since the medical field is a high professional field and there is a information asymmetry between the three participants of the medical insurance transactions,the abnormal behaviors can be easily concealed for medical insurance institutions.So studying the method to detect the clinical abnormal behaviors has significant theoretical and practical value for standardizing medical order and preventing medical fraud.Combing the domestic and foreign research results,this article analyses the scheduling and natural cohesion of clinical behaviors,uses Bisecting K-Means algorithm to cluster the data set of normal clinical medical behaviors,uses the resulting clusters as a profile of normal clinical behaviors,and studies the abnormal detection technology which aims at the characteristics of abnormal clinical medical behaviors based on distance.Finally I get the archetype system of clinical anomaly detection to detect the potential abnormal behaviors in the clinical data.The main works of this article are as follows:(1)Propose the C lustering Bisecting K-Means algorithm based on the overall similarity match(PSC lu).PSC lu optimizes the method that Bisecting K-Means algorithm uses to calculate the distance and uses edit distance as a function to measure inter cluster sequence similarity.By combining the calculation of the upper and lower founds of edit distance,the similarity of prefix subsequence and the approximate solution of centroid of cluster,I filter out some calculation of edit distance and lower the time complexity of Bisecting K-Means algorithm,and finally generate normal clinical sequence clusters in a rapid speed.(2)Propose a method to calculate the similarity of the sequence to centroid of cluster.In order to find abnormal clinical behavior sequences more effectively,I study the comparison of the drug usage and price similarity based on the similarity of the drug effect.According to the difference of the significance of medical behaviors,I use the weighted edit distance algorithm(WED)to calculate the similarity between the centroid of cluster and the sequences to be tested.(3)Constructing anomaly detection model.Use PSC lu algorithm to generate normal clusters and use these clusters as normal profiles;introduce WED algorithm to calculate the similarity between the sequence to be tested and the centroid of cluster,regard the degree of difference as a reference to check whether there are abnormal behaviors,and then construct the anomaly detection model which includes mechanism of data preprocessing,clustering and similarity judgment.(4)Design and implement the archetype system and implement the page,service layer and data persistence layer of the archetype system based on anomaly detection model.The performance of the anomaly detection system is analyzed and evaluated through the clinical behavior data of a medical institution.

Keywords/Search Tags:

Clinical behavior sequence, Sequence clustering, Similarity measurement, Anomaly detection

PDF Full Text Request

Related items

1	Anomaly Detection Model Of Clinical Sequences
2	Study On Expression,Function And Molecular Mechanisms Of FAM96B In Hepatocellular Carcinoma
3	Research On Anomaly Detection For Medical Insurance Record Based On Improved Fuzzy Clustering Algorithm
4	Expression Of FAM96A And FAM96B In Gastric Cancer And The Clinical Significance
5	The Expression Of FAM96A,FAM96B And Caspase 3 In Gastrointestinal Stromal Tumor And Its Correlation With Clinicopathologic Characteristics
6	Fast Imaging Sequence Design And Research On Methods Of Eliminating Artifacts
7	Research On Profile Similarity Search And Disease Auxiliary Diagnosis Algorithms Based On Graph Sequence Model
8	Isolation And Sequence Analysis Of HIV-1 Strains Circulating In Some Regions Of China
9	Research On The Detection And Classification Method Of Cancer DNA Sequence Variation Based On Data Mining
10	From gene to brain and behavior: Dopaminergic effects on motor sequence learning in Parkinson's disease