| With the rapid development of intelligent agriculture,the information agricultural technology popularization has changed our way of acquiring agricultural knowledge,which is based on information technologies such as the Internet,big data,artificial intelligence and so on.Now,agro-technical Q&A community is one of the most widespread ways to popularize the informational agriculture.Users can ask questions about the agricultural knowledge and in the meanwhile,agricultural experts and technology extension agents can solve them timely.agro-technical Q&A community has accumulated a large amount of agricultural knowledge data covering the whole process and all fields of agricultural production.Now,these practical and theoretical agriculture text data hasn’t been used fully and effectively and the main reasons are as follows: the agriculture texts mostly are stored in the form of unstructured ones and can’t be transformed into structured ones exactly.The agriculture texts include general texts based on various semantic information and also a wealth of knowledge in the field of agriculture,which is too difficult to be analyzed accurately;the agriculture text information can’t be separated from specific agricultural production environment and the agriculture information of different fields is hard to exchange.the relevant researches of syntax analysis,like the text classification,the text matching and named entity recognition,are faced with general texts,and the applicability to agricultural texts is still not high.In order to make more comprehensive and in-depth use of agricultural text data and promote the promotion and high-quality development of the information agricultural technology popularization,it is urgent to explore effective text mining technology to extract agricultural knowledge intelligently.To obtain agricultural knowledge from the massive agricultural short text data,firstly,the text classification technology is used to accurately divide the text,then the text matching technology is used to automatically screen the target text,and finally the named entity recognition technology is used to extract the required agricultural knowledge from the target text data.The effect of agricultural knowledge extraction is affected by each task.The agricultural knowledge data takes on the features,like huge amount of data,various structures,short texts,non-standard presentation,which increases the difficulty and challenge for text mining.Based on agro-technical Q&A community and other relevant semantic analysis at home and abroad,this thesis focuses on how to use semantic enhanced representation and semantic feature extraction for text mining.The main task contains building a data set applied on a deep learning model of agricultural field,using conventional neural networks,recurrent neural networks and other deep learning models to text classification,semantic matching and named entity recognition.It has effectively solved the tough problems of massive agricultural isomerism and data mining in agricultural fieldsThe main contents and results of the thesis are as follows:(1)a short text classification method based on Bi GRU_Mul CNN model was proposed to overcome the limitations of the classification process.In the pre-processing phase,compared to the depth of the traditional model of learning for text classification,TF-IDF algorithm is used to select characteristic words of the text,enlarge similar characteristic words and signifie the comprehensive value of TF-IDF and Word2 vec quantity;bi-directional gated recurrent unit was applied to catch the context feature information,multi-convolutional neural networks were finally established to gain local multidimensional characteristics of text.The results showed that in 12 agricultural short texts which includes cultivation,animal diseases,pests and diseases,the classification accuracy of the model accounts for 95.9% and compared with other methods,the advantages of the classification are obvious.(2)a short text semantic matching model based on integrated multi-semantic characteristics was proposed to improve the accuracy of semantic matching and solve the problem of single feature in text matching.the deep semantics,word co-occurrence and maximum matching degree of agricultural short text were extracted and Co_Bi LSTM_CNN model composed of bi-long short-term memory,convolutional neural networks,dense networks and Siamese network of shared parameters,was proposed to extract multi-semantic features.Compared with other text matching model,it can change the original text into word co-occurrence relation representation and improve the accuracy of character expression,based on calculating the weight of the word co-occurrence of each group;it can design the biggest text matching representation and extract the original text,and Word co-occurrence represents the maximum matching the characteristics of the text,and further enrich the text representation.The test showed that,for agriculture-oriented short text,the accuracy of the semantic similarity judgment accounts for up to 94.15%.(3)A named entity recognition model of agricultural diseases and pests based on hybrid word embedding is proposed.text word vector,word vector and part of speech feature vector are taken as model inputs,which further enriches the expression of entity features.The Bert pre training model is used to extract text features,capture the bidirectional semantic features of text depth,and integrate the global semantic information into the single word vector,which further improves the representation ability of word vector.Add the part of speech features of the text,and Bi LSTM is used to extract the part of speech semantic features containing context information.Finally combines with conditional random fields for 7 named entity recognitions such as crops,diseases,pests,and symptoms of chemical fertilizers and pesticides and so on.The test showed that,when identifying agricultural pest and disease information,the win-rate accounts for up to 92.07%. |