| Cancer is the second largest disease in the world,accounting for one sixth of the death rate.Over the years,researchers have been exploring the potential relationship between unhealthy lifestyles,smoking,alcohol abuse,obesity and lack of exercise,and the incidence rate and mortality rate of cancer.But at present,it is still challenging.On the one hand,the purpose of this paper is to provide some guidance for people’s healthy life through the correlation analysis of cancer and lifestyle,on the other hand,to provide numerical basis for the targeted use of medical financial resources through the prediction of cancer incidence and mortality in the next few years.The main research features and work of this paper include the following two aspects:Data preprocessing and sample expansion method design.This paper uses the original data of lifestyle and cancer occurrence from 1990 to 2017 from the Centers for Disease Control and prevention and the American Cancer Society as the analysis data source.In order to ensure the quality of data source,the paper first uses Excel to preprocess the original data source,such as format conversion and data de duplication,and uses R software to filter the data to get the statistical value needed for data analysis.On the basis of data cleaning,this paper uses the method of experimental analysis and comparison to choose the best cubic spline interpolation technology with good smoothness and suitable for the data source obtained to expand the original data,so that the data can be upgraded from the original 25 entry points to 300 entry points,and the annual data can be converted into monthly data,which solves the problem of insufficient subsequent correlation analysis and prediction samples.The selection of influencing factors is also based on the available data sources,which is a reasonable multi factor analysis.Two Stage Attention algorithm model TSA-LSTM design.This paper is based on the mainstream tool Tableau,whose powerful visualization function provides great convenience for the research work of this paper.However,the experimental results show that Tableau is not ideal for the analysis and prediction results of the time series data obtained in this paper.On the basis of LSTM,an optimization model TSA-LSTM is designed.The first stage of the TSA-LSTM model attention mechanism is the input feature attention,which keeps the attention to the importance of input feature,so as to ensure that the model encoder converges to some features of input sequence when predicting the specific features of output sequence,so as to enhance the natural learning trend of the model and improve the prediction quality;the second stage of attention mechanism is time performance attention,which keeps the model at all times Based on the real-time performance,we can select network features and improve the prediction quality.In this paper,the detailed design and implementation of the TSA-LSTM model are given.Based on the TSA-LSTM model,the factor correlation analysis and trend prediction experiment are carried out to verify the data source.The results of visual experiments show that excessive drinking can lead to breast cancer,colorectal cancer and colon cancer;lack of exercise and obesity can lead to thyroid cancer,colon cancer and uterine cancer;most cancers have overlapping risk factors;lung cancer 84.1%,laryngeal cancer 71.8%,oral cancer 69.8% are directly related to poor lifestyle.Based on 2017,the incidence rate of cancer will increase by 9% to 21% between 2020 and 2025.The long-term prediction accuracy is up to 97.76%,and excessive smoking and obesity may be the main cause of cancer incidence in the future. |