Font Size: a A A

Research On Semantics-based Interaction Model And Knowledge Discovery Methods For Internet Data

Posted on:2016-09-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:P G RenFull Text:PDF
GTID:1318330536967142Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technologies,advanced technologies and novel applications are constantly emerging,such as the cloud computing,Internet of Things,Industry 4.0,Internet+,Electronic Commerce and the Mobile Internet.The types and the amount of data objects in the cyberspace are growing in an unprecedented speed,and we have entered the new period of internet big data.The internet data has the 4V characteristics of big data: Volume,Variety,Velocity,and Value.The 4V characteristics of internet data can be described as follows.1)The data objects in the cyberspace are growing rapidly,and the data volume of the internet data is expanding continually,presenting the characteristic of magnanimity.2)The data types of the internet data are becoming more and more various,emerging more and more structured data,semi-structured data and unstructured data,presenting the characteristic of variety and heterogeneity.3)The data objects in the cyberspace are changing and updating frequently,presenting the great real-time characteristic.4)The data objects in the cyberspace are of great value,but owning to the variety and sparsity of the internet data,the value density of internet data is sparse.For the 4V characteristics of internet data,it is difficult to understand the semantic information of internet data and hard to effectively discover the data resources required by users using the traditional exact-matching based data mining methods.Recently,the internet data has been combined with many emerging industries in nowadays society.Therefore,it has a great realistic significance and a wide application prospect to realize semantics-based intelligent discovery and interaction for internet data.In this thesis,on the basis of reviewing the existing related work systematically,we performed an in-depth study on the two aspects: intelligent semantics-based interactive coordination model for internet data;effective semantics-based organization and intelligent discovery methods for internet data.The major contributions and innovations of this thesis are summarized as follows.(1)Propose an intelligent semantics-based data interactive coordination model for internet data.With the continuous development of information technique especially the network technique and the advent of the internet big data era,it is becoming a great challenge about how to discover?transfer?organize and process the internet data efficiently and intelligently.To realize efficient and intelligent data discovery and interaction for internet data,we propose an intelligent semantics-based data interactive coordination model for internet data.By defining the coordination channels?the coordination atoms and the coordination units,the model can support diverse data interaction modes and the semantics understanding ability of internet data;Besides,the model enables complex data control functions inside a network system;so the model can realize intelligent discovery and flexible interaction for internet data.Through the graphical representation of data interactive behaviors,the model supports the explicit design of complex data interactive systems in a form of flow graph;by defining the behavioral semantics of coordination channels and coordination atoms,the model can strictly verify the consistency between the system model design and the system implementation.We can flexibly design different internet data interactive systems according to actual requirments using the proposed model.In this thesis,we concretely design a semantic-based asynchronous data interactive demo system,a semantic-based synchronous data interactive demo system and a software-defined semantic-based internet data interactive demo system,demonstrating the model can support intelligent?active and flexible internet data interactions.The semantic coordination atom can intelligently recognize the semantics information of internet data with different forms.The semantic coordination atom can extract the features of internet data and represent the data objects as high-dimensional points in a high-dimensional feature space,so we can discover the semantics-similar data objects to a given user query by calculating the distances between high-dimensional points,and then realize the semantics-based internet data intelligent discovery and interaction.So the semantic coordination atom is of great significance in the semantics-based internet data interactive systems.(2)Propose the i Hash,a semantics-based data organization and discovery method using the Hash technology.It is difficult to effectively and intelligently discover the data resources using the traditional exact-matching based data mining methods.To realize the function of the semantic coordination atom that discover the useful data that the users required from the diverse internet data effectively and intelligently,we present the i Hash method,a semantics-based data organization and discovery method using the Hash technology.First,the i Hash normalizes the internet data objects into a high-dimensional feature space using the Hash technology,solving the “feature explosion” problem of the high-dimesional feature space;then,the i Hash partitions the data space into subspaces using clustering algorithm,and transforms each subspace into a hypercube so that the Pyramid-similar technique can be applied to map the high-dimensional data objects to one-dimensional values;next,the i Hash builds the high-dimensional index for the data objects using the B+-tree;and finally we realize the semantics-based range and k NN queries.We discuss the performance evaluation of the i Hash method and find it performs efficiently for semantics-based similarity search.(3)Propose the i Tree,a semantics-based data organization and discovery method using the principal component analysis technology.The 4V characteristics of internet data result in that the dimensionality of the fea-ture space is very high,which can easily cause the “curse of dimensionality” problem.To solve the “curse of dimensionality” problem,improve the efficiency of the semantics-based internet data intelligent discovery,we propose the i Tree method,a semantics-based data organization and discovery method using the principal component analysis technology.The i Tree method firstly uses the principal component analysis(PCA)technology to reduce the dimensionality of the feature space,meanwhile eliminate the interference of data redundancy and noises.Then,the i Tree improves the i Distance method to build the high-dimensional data index,which can filter the semantics-irrelevant data objects more effectively.The improved way is as follows: we firstly utilize the k-means algorithm to process the data objects,obtaining a series of data clusters;next for each data cluster,we futher divide the data cluster into a certain number of subspaces according to the spatial relationships between the data objects and the reference point;then we map the data objects in different subspaces into different intervals in one-dimensional space;and finally we organize the data objects and build the high-dimensional index using the B+-tree.The i Tree can greatly narrow the searching scope during semantics-based similarity queries,which can realize the effective semantic-based data intelligent discovery.At last,the experimental results show that the i Tree method can achieve much better efficiency.(4)Propose the i Pyramid,a semantics-based data organization and discovery method using the random projection technique.The data objects in the high-dimensional feature space are sparsely distributed.Inspired by theories of sparse optimization and compressed sensing,we propose the i Pyramid,a semantics-based data organization and discovery method using the random projection technique.The i Pyramid combines the dimensionality reduction and multi-dimensionality techniques to realize the semantics-based data intelligent discovery.First,the i Pyramid method uses the random projection technique to reduce the dimensionality of feature space.The random projection technique can preserve the semantic information of data objects after dimensionality reduction,meanwhile reduce computational and storage costs effectively.Second,the i Pyramid clusters the data objects using the K-means method to make sure that semantics-similar data objects are in the same data cluster,and transforms the data clusters into regular unit hyper-cubes so that the data objects in them can be indexed using the Pyramid-Technique.Finally,we realize the semantics-based similarity queries and the experimental results proved the i Pyramid can achieve effective semantics-based internet data intelligent discovery.
Keywords/Search Tags:Internet data, interactive coordination model, multidimensional index, semantics-based similarity search, hash, principal component analysis, random projection
PDF Full Text Request
Related items