Research On The Clustering Technology Of JSON Semi-structured Document

Posted on:2018-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:D W Liu

Full Text:PDF

GTID:2348330542969351

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

The semi-structured document occupies the vast majority of data in the Internet.How to deal with semi-structured documents has become the focus of business and academic attention.JSON is a typical semi-structured document widely used in the Internet,but JSON document clustering research rarely involved.In this paper,we study the clustering technology of JSON semi-structured documents,propose an advanced hybrid clustering algorithm based on K-Means,apply the clustering model to government open data,and finally implement the clustering system.We introduce the characteristics of semi-structured documents,and make a comparison between JSON and XML documents from qualitative and quantitative aspect.From the model perspective,the document vector representation of JSON semi-structured document is given.Considering the feature reduction technique and both the hybrid factor and the path level factor,an advanced hybrid clustering algorithm based on K-Means is proposed.From the application perspective,the background of the government open data and the relevant information of the data set are provided.We discuss the clustering quality evaluation index and designs the experiments of the clustering validity evaluation experiment and determination of the number k of the clusters.From the system perspective,the clustering system of JSON semi-structured document is implemented,and the system flow chart is designed and the system module are designed.The concept of frequent weight and specific weight is proposed for system effect virtualization.The conclusions of this paper can be shown as follows:(1)Two factors influencing the ability of document differentiation are proposed:path level and hybrid factor,which can be verified in the experiment.(2)Experiments show that it is necessary to comprehensively examine the effect of the two on the clustering effect,to verify the separate consideration of the hybrid factor or the path level factor lonely is not enough.(3)In the JSON semi-structured document clustering,it is verified that the SC index is better than the CHI index.(4)Develop and implement a prototype system for JSON semi-structured clustering.(5)Put forward the frequent weight and specific weight,from the topic and model angles to show JSON semi-structured document content and structure of the two parts.While in the display process the tag cloud technology is used,effect of presentation is very obvious.

Keywords/Search Tags:

JSON, XML, K-Means, Mixture Factor, Path Level

PDF Full Text Request

Related items

1	SAR Image Target Recognition Based On Random Measurements And Mixture Factor Analyzers
2	Research On Image Segmentation Based On CV Level Set Method
3	Research On Change Detection Of Remote Sensing Images Based On Mixture Of Factor Analyzers And Markov Random Field
4	Research On Motion Planningand Path Optimization For Manipulator Based On Gaussian Mixture Models
5	Research On Temporal Data Modeling And Query Processing Based On JSON
6	High Resolution SAR Image Classification Based On The Mixture Model And Level Set
7	The Research On The Multi-path Fading In Fireless Communication & Navigation Channel
8	Research On Local Region Level Set Method Based On K-means++ Algorithm
9	Study Research On Web Security By JSON Web Signature
10	Research And Implementaion Of Communication System Based On JSON