Font Size: a A A

Diffusion Utilization And Study Of Data Concentration

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:2370330611973198Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In actual work scenarios,data features and labels are often missing to different degrees,which is called data with different concentrations.For example,text categorization data contains a large amount of unlabeled text,and clinical information prediction experiments contain many subjects with missing features and labels.Given in practice,did not mark the sample also contains the hidden information of the data distribution,low concentration data of the sample can be through the inner link and the height of the correlation between existing information to supplement its,therefore a priori information of different concentration data contains excavations can effectively improve the result of the experiment.The specific research work of this paper is as follows:(1)Semi-supervised learning uses a large number of untagged examples,along with tagged examples,for pattern recognition.In essence,existing graph based semi-supervised learning methods belong to label propagation ones by simulating various propagation mechanics.In this study,different from the existing propagation mechanics,it attempts to exploit a novel elastic-force based propagation method to realize semi-supervised learning.The basic idea is to imagine that each node in a graph accepts elastic forces in an elastic coefficient from its neighbors and transmits elastic forces to its neighbors in another elastic coefficient.As a result,the difference between two types of elastic forces measures the propagation quantity of each node.Based on this novel idea,this paper derives the corresponding update equations of all nodes in the graph,which will further induce an analytical solution by expressing these equations in a matrix.In other words,the proposed method has its reliable foundation from the philosophy of physics.Besides,it also demonstrates the rationale of the proposed method from the perspective of optimizing the corresponding objective function,which guarantees the convergence of the proposed method.The extensive experimental results verify the effectiveness of the proposed method in semi-supervised learning.(2)The problem of sample data loss is extremely common in medical chronic diseases-Alzheimer's disease research.In order to improve the learning performance,firstly,the low-concentration data were filled in by matrix decomposition;secondly,in order to make full use of the characteristics and labels in the low-concentration data,this paper conducted joint modeling for disease status prediction from multiple perspectives and multiple time points.The existing machine learning methods for predicting the performance of chronic diseases at future points in time are all based on a single task and perspective without fully considering the double heterogeneity in the development process of chronic diseases.Especially for the prediction task appearing in each time point,information related characteristics also exist in multiple time series.The performance of these tasks is constrained by multiple factors and analyzed from multiple sources and time points.Accurate judgment and prediction of the current situation can enable patients to actively receive medical treatment.In this paper,a novel disease prediction model with low concentration data is established considering source consistency and time smoothness.It is proved theoretically that the proposed model is a linear model,the basic principle of the method is demonstrated,and the convergence of the method is guaranteed.The effectiveness of this model in predicting the clinical score of Alzheimer's disease can be seen after a large number of experiments.
Keywords/Search Tags:data concentration, lack of the label, label propagation, lack of the sample, clinical score prediction
PDF Full Text Request
Related items