Microbiome and its surrounding environment form a variety of ecosystems with complex interrelationships.The human microbiome has co-evolved with humans and is known as the “second human genome”.Benefits by the development of high-throughput sequencing technology,massive amount of microbiome sequencing data has been generated.How to profile these data efficiently and quickly,accurately decode the interaction between the taxonomy or function of the microbiome and its surrounding conditions,so as to further explore the biological meaning hidden in the massive data becomes crucial.High-throughput sequencing data processing and diversity analysis are fundamental to microbiome research.However,current data preprocessing and analysis methods are inefficient,and most of the operations based on the command line,which brings difficulties to the applications under multi-sample and multi-platform.To tackle these problems,in this work we propose a microbiome-oriented bioinformatics tool Parallel-Meta Suite(PMS),which provides a reliable solution for profiling and analyzing microbiome sequence data.PMS not only implements an automatic and comprehensive analysis process,including sequence alignment,taxonomy and functional feature extraction,diversity statistics,visual chart drawing,etc.,but also provides an easy-to-use graphical interface under multiple platforms for parameter configuration and results display.In addition,PMS optimizes the entire workflow through parallel computing thus enables the quick handling of thousands of samples.Therefore,the bioinformatics analysis toolkit brings more opportunities for large-scale data mining of microbiome.Based on the taxonomy and functional profiles analyzed by PMS,this study further explores the association of microbiome and human health.With the improvement of living standards,people pay more attention to the nutritional value of trace elements and the problems caused by their insufficiency.Selenium is an essential trace element for the human body,but the same dose of selenium supplementation has different absorption efficiency for different people.This study analyzes and compares gut microbiomes from two groups with different absorption efficiency,then screens biomarkers that can distinguish the two groups,and utilizes e Xtreme Gradient Boosting(XGBoost)method to predict the selenium absorption efficiency.Final,the prediction model and prediction process are encapsulated into an easy-to-use prediction tool.And the real cohort will be evaluating to determine whether it can absorb selenium efficiently,so as to lay a data foundation for accurate nutrition.Despite this study finds a link between the microbiome and nutritional,there are still many diseases that do not have well-defined biomarkers,or that marker-based predictions may be deficient in accuracy.Therefore,this work proposes another health status detection strategy based on search.Comparing the species-level profiles against to large-scale metagenomes,outlier samples are screened out as unhealthy,and their detailed disease types can be identified by top matches.Benchmarking on a multi-cohort dataset with over3,000 metagenomes,the search-based approach achieves promising overall accuracy that was superior to marker-based models constructed by Random Forest(RF),Supporting Vector Machine(SVM)and XGBoost.More importantly,the search-based method also features a good balance performance on different diseases.Hence,this case study further demonstrates the potential and capability of metagenome big data in human health,as well as moves one-step forward of search-based approach in microbiome research and application.On the other hand,in previous studies of the human microbiome,in order to focus on the interaction of specific diseases with microbes,each sample was marked with a single label of its health status.However,the same patient may contain multiple comorbidities or co-morbidities(i.e.,multi-labels),which jointly alter the taxonomy and function of the microbial community,thereby interfering with the detection of its status.In addition,routine detection models may miss certain complications or comorbidities,limiting the practical application of microbiome technology.Therefore,this study also reviews the regular machine learning-based classification for microbiome research,analyzes and summarizes the limitations of such methods in multi-label disease detection through real datasets.Finally,this study looks forward to the direction of further development of the microbiome in the identification of human health status,including a series of promising strategies and key technical issues for multi-label classification. |