Text Data Statistical Analysis Of Agriculture Internet Of Things

Posted on:2016-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:H Liang

Full Text:PDF

GTID:2308330464452475

Subject:Statistics

Abstract/Summary:

With the rapid development of the Internet ã€ Internet of things and other information technologies, the network has be accumulated a large number of semi-structured and unstructured text data. It became a major task that how to get the information you need from these massive electronic documents for statistical analysis worker. In recent years, with the growing demand of peopleâ€™s material life, the quality and output of agricultural products are getting more and more attention. The agricultural Internet of information and software as the center of the production mode has been widely used. The real time monitoring, remote control and query are important to the development of modern agriculture. The mining of the text on the Internet of agriculture is a very valuable thing. At present, there are a lot of researches of text mining at home and abroad. Text mining methods are maturing. Text mining content becomes rich increasingly. It main applies for text similarity detectionã€text categorizationã€information retrieval and other fields. In addition, the efficient and visual information graph becomes a new type for text display, such as a word cloud. This paper mainly explores and research from two aspects of text similarity and word cloud with the text data from the Internet of agriculture.In the text similarity research, this paper uses two methods to analysis text similarity. One is the method of combining the micro variation of keywords clustering and LD algorithm. we reduce the low-frequency words in the text with clustering method, calculate the similarity between characteristic words by LD algorithm, built text similarity matrix; Finally, calculate the similarity between texts by characteristic words similarity matrix and space vector which is built by weight. Another is an analytical method combine threshold optimization and e EP pattern. First, calculate the minimum threshold by the rough set joint decision distribution density matrix after get the document feature item frequency distribution table. Then obtain high frequency words based on the semantic intra-class document frequency by combining semantic analysis and inverse document frequency method and the simplest model by e EP pattern classification method. At last, calculate the score of text similarity by the similarity formula and the semantic relevancy provided by How Net and optimize the threshold by the decision theory of three-way.In the text word cloud research, this paper proposes a text mining method based on statistical analysis of word clouds and topic model. First, we should do some pretreatment of removing the number and the stop word in a text; then, do Chinese word segmentation, build corpus and set up document-term matrix; Finally, present the mining result with Word Clouds and topic model.

Keywords/Search Tags:

clustering, LD algorithm, text similarity matrix, vector space model, decision model of rough set, agriculture, internet of things

Related items

1	The Application Of Rough-Set-Model Based Text Clustering Algorithm In The Text Filtering
2	Study On Similarity-based Text Clustering Algorithm And It's Application
3	Research On English Text Clustering Method Based On Vector Space
4	Text Similarity Computing Theory And Applied Research
5	Application Of Rough Set Theory In Chinese Text Categorization
6	Research And Implementation Of Chinese Text Clustering Algorithms
7	Research And Implementation Of Text Similarity Algorithm Based On Semantic Fusion
8	Research On Key Techniques In Text Mining
9	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
10	Research On Text Clustering Based On Hownet