| The amount of information on the Internet has increased dramatically with the explosive growth of Internet information.People’s demand for accurate Query of information in a certain field is increasingly difficult to meet.The construction of search engines in specialized fields has gradually become a hotspot research in the engineering field.For a long time,general search engines mainly perform simple character matching queries based on user-entered data,and only return content that matches the data on the page to the user.Thus,the amount of information,which is returned by searching,is usually very large.In addition,since every individual can get access to Internet,alm,ost everyone is able to publish information through the Internet.However,the Internet lacks accurate assessment criteria for the accuracy of published information which leads to the result that useful information on the Internet is overwhelmed by massive amounts of poor information.The content is so varied that it is difficult for ordinary users to distinguish the Query results in a certain field.With the using of bidding ranking technology by commercial search engines,the search results from general search engines are often complex for the ordinary users.Artificial identification is still needed to select useful information.The mainly research contents are as follows:Firstly,the analysis and formalized expression of professional medical books,clinical outpatient medical record data,and expert experience are studied.The processing and analysis of these three different forms and structures of medical data can lay the foundation for the subsequent construction of the correlation model between Query and disease.The search engine built by the aforementioned model will give users a better search result.When users enter search Query,Information Retrieval techniques can return some results,but it does not guarantee a good correlation between the search results and the Query.Based on it,two single machine learning models,Naive Bayesian Model and Decision Tree Model,are applied to evaluate the correlation between Query and disease.The Naive Bayes Model calculates the probability of different categories o f the samples by calculating the probability of different medical characteristics under different disease categories.It can help us learn the distribution rules of medical characteristics,and obtain finer result when the number of samples is very small.On the other hand,the Decision Tree Model and the disease-related model judge what should be determined next through estimating whether the Query has a certain characteristic until a diagnosis result is given.Therefore,given the corresponding Query,the Bayesian and Decision Tree Models can give the probability of which disease category the Query belongs to and apply it to the ranking of the final result,thereby improving the accuracy and recall rate of the search results.An ensemble learning model based on heterogeneous models is proposed.By calculating the correlation of different models for different disease categories,multiple models were weighted and integrated into strong learning models according to the accuracy of different models.Finally,the results of the integrated model are applied to the ranking,thereby improving the accuracy of the shrinking results.Finally,based on the above research results,the framework and main functions of the disease information retrieval system are introduced.The experimental results show that,compared with the common information retrieval model and the retrieval model of machine learning method,it can further improve the correlation between the query results and the disease information,and improve the user experience. |