Font Size: a A A

Research On The Application Of Association Rules Between Serum Mass Spectrometry Data And Physical Examination Data

Posted on:2024-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2530306920963329Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Serum mass spectrometry data can simultaneously detect thousands of metabolites and proteins,providing a new tool for early diagnosis and treatment of diseases,and therefore,it has been increasingly attracting the attention of medical researchers.Health examination data are obtained during health check-ups and these indicators can reflect the health status of the human body.Therefore,combining serum mass spectrometry data with health examination data for analysis can not only improve the accuracy and effectiveness of disease diagnosis and treatment but also enable in-depth study of the pathogenesis and prognosis evaluation of diseases.However,there are few studies on the correlation between serum mass spectrometry data and health examination data attributes.Based on the correlation algorithm and the characteristics of serum mass spectrometry data and health examination data being of indefinite dimension and continuity,this article proposes a correlation analysis method based on serum mass spectrometry data and health examination data,with the following main improvements:(1)A standardization strategy based on serum mass spectrometry data is proposed to address the non standardization characteristics of raw serum mass spectrometry data with uncertain dimensions.The non dimensionality of transposed serum mass spectrometry data results in missing values being concentrated at the tail of the data.A missing value filling strategy based on regression curves is proposed to address the above issues.The experimental results show that missing value filling based on regression curves can better fit the curve trend at the tail of the data.In response to the inconsistency between serum mass spectrometry data and physical examination data,a data merging method based on this study is proposed to merge serum mass spectrometry data and physical examination data into serum mass spectrometry physical examination data.(2)Association analysis requires discrete data,and serum mass spectrometry physical examination data is continuous data.This paper proposes KL Means algorithm to discretization the serum mass spectrometry physical examination data.This algorithm solves the problem that each column of serum mass spectrometry data has different number of discrete categories,and solves the problem of randomness in the selection of initial centroids.The experiment compares and verifies the discretization effects of different discretization methods through coefficient of variation,It is proved that KL Means algorithm has the best discretization effect in serum mass spectrometry physical examination data.(3)Based on the unbalanced distribution of serum mass spectrometry data and physical examination data after discretization,a pruning strategy based on frequency term was proposed.Among the association rules generated before and after the improvement of the Apriori algorithm,the association rule of P0.1 is 90.27%,an increase of 14.27%;Among the association rules generated before and after the improvement of the FP Growth algorithm,the association rule of P0.1 is 90.79%,an increase of 16.34%.Experiments have shown that pruning strategies based on frequency terms can effectively enhance the effectiveness of association rules between serum mass spectrometry data and physical examination data.By combining the Apriori algorithm with the association rules generated by the FP Growth algorithm,the association rules obtained in P0.1 were 98.6%,which increased the effectiveness by 8.33% compared to the improved Apriori algorithm and 7.81% compared to the improved FP Growth algorithm.Experiments show that the combination of Apriori algorithm and FP Growth algorithm can effectively filter invalid association rules.This paper focuses on the association analysis of serum proteomics data and physical examination data,aiming to mine the association relationships between different attributes of the two types of data.The generated association rules were validated,and the effectiveness of the rules reached 98.6%.The results demonstrate that the proposed method for association analysis of serum proteomics data and physical examination data is effective in mining the association relationships between different attributes of the two types of data.
Keywords/Search Tags:Mass spectrometry data, Association analysis, KL-Means algorithm, Apriori algorithm, FP-Growth algorithm
PDF Full Text Request
Related items