With the continuous development of genomics,lncRNA plays an important regulatory role in various biological functions.Recent plant studies have shown that lncRNA plays a crucial role in controlling key processes such as flowering time,root organ growth and adaptation to environmental changes.However,compared to the animal field,research on lncRNA in plants is relatively lagging.Meanwhile,how to find reliable lncRNA data from the large amounts of data is a key issue in the field of plant lncRNA research.In terms of cellular functions,lncRNA usually plays very important roles by binding with proteins,such as regulating cell maturation,cell nuclear export and cell stability.Currently,many artificial intelligence methods are used to predict lncRNA-protein interactions,but there are still some problems in these methods.For example,these methods may require excessive professional knowledge,not make full use of sequence features,or only perform experiments on a single type of lncRNA-protein interaction dataset,etc.Therefore,this paper investigates plant lncRNA and lncRNA-protein interactions based on machine learning and deep learning methods,respectively,with three main research components as follows:(1)A plant lncRNA identification method based on ensemble learning is proposed.Firstly,the method extracts sequence length,GC content and k-mer frequency as sequence features.Secondly,a subset of features is extracted from these features by the chi-square test method,and the optimal number of feature subsets is determined among multiple machine learning algorithms.Finally,the optimal features are input into the machine learning algorithm for training and identification of plant lncRNA.Among them,GBDT shows better performance than other classification algorithms.Therefore,the sequence feature-based method for identifying plant lncRNA proposed in this article has good performance and is expected to promote the development of plant genomics research.At the same time,this method still has stability and effectiveness when different types of m RNA are used as negative samples.(2)A deep network prediction model of lncRNA-protein interaction relationship based on sequence feature encoding is proposed.Firstly,the RNA sequences are encoded based on their tetranucleotide features of RNA.Then,the protein sequences are encoded based on their physicochemical properties and triplet features.Finally,the feature-encoded RNA and protein sequences are fed into a deep network built by convolutional neural network and multilayer perceptron to make predictions and output the prediction results.The experimental results show that the deep network model based on sequence features encoding proposed in this paper has achieved accurate prediction of the interaction relationships between different types of lncRNA-protein,demonstrating its wide applicability and high prediction accuracy.(3)A Plnc-LPI system is designed and realized.This system uses the MVVM framework to visualize plant lncRNA prediction and lncRNA-protein interaction relationship prediction methods.In addition,the system provides a data file management function for users to upload and export sequence files,which better reflects the application value of the research work in this paper. |