Font Size: a A A

Text Data Statistical Analysis Of Agriculture Internet Of Things

Posted on:2016-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:H LiangFull Text:PDF
GTID:2308330464452475Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet 、 Internet of things and other information technologies, the network has be accumulated a large number of semi-structured and unstructured text data. It became a major task that how to get the information you need from these massive electronic documents for statistical analysis worker. In recent years, with the growing demand of people’s material life, the quality and output of agricultural products are getting more and more attention. The agricultural Internet of information and software as the center of the production mode has been widely used. The real time monitoring, remote control and query are important to the development of modern agriculture. The mining of the text on the Internet of agriculture is a very valuable thing. At present, there are a lot of researches of text mining at home and abroad. Text mining methods are maturing. Text mining content becomes rich increasingly. It main applies for text similarity detection、text categorization、information retrieval and other fields. In addition, the efficient and visual information graph becomes a new type for text display, such as a word cloud. This paper mainly explores and research from two aspects of text similarity and word cloud with the text data from the Internet of agriculture.In the text similarity research, this paper uses two methods to analysis text similarity. One is the method of combining the micro variation of keywords clustering and LD algorithm. we reduce the low-frequency words in the text with clustering method, calculate the similarity between characteristic words by LD algorithm, built text similarity matrix; Finally, calculate the similarity between texts by characteristic words similarity matrix and space vector which is built by weight. Another is an analytical method combine threshold optimization and e EP pattern. First, calculate the minimum threshold by the rough set joint decision distribution density matrix after get the document feature item frequency distribution table. Then obtain high frequency words based on the semantic intra-class document frequency by combining semantic analysis and inverse document frequency method and the simplest model by e EP pattern classification method. At last, calculate the score of text similarity by the similarity formula and the semantic relevancy provided by How Net and optimize the threshold by the decision theory of three-way.In the text word cloud research, this paper proposes a text mining method based on statistical analysis of word clouds and topic model. First, we should do some pretreatment of removing the number and the stop word in a text; then, do Chinese word segmentation, build corpus and set up document-term matrix; Finally, present the mining result with Word Clouds and topic model.
Keywords/Search Tags:clustering, LD algorithm, text similarity matrix, vector space model, decision model of rough set, agriculture, internet of things
PDF Full Text Request
Related items