Font Size: a A A

Research On Related Issues Of Unstructured Data

Posted on:2018-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C ZhengFull Text:PDF
GTID:1318330542952109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the galloping progress of emerging technologies like internet,artificial intelligence and machine learning,we have ushered in an era of big data,marked by further promotion of innovative applications including "Smarter Planet" and "Smarter Cities" and continuous growth of information data at an exponential rate.Beyond the traditional structured data,information data of today is big,heterogeneous and high dimensional.That is to say,if we directly adopt the traditional way of data integration against a backdrop of big data,high redundancy of data integration,disunity of data description and inaccuracy of data presentation-problems that cut the quality of data integration are unavoidable,which will definitely cast a shadow over subsequent big data analytics.Therefore,how to develop an effective way of data fusion so that the redundancy is eliminated and points as well as annotations of the data entity areunified,not only plays a key role in high quality data integration,but serves and guarantees data analysis and data mining later on.In the era of "Internet plus",as the major carrier of data description,massive heterogeneous data has a characteristic of dis-unified data description and has a problem of presenting inaccurate and inadequate data features.Thus,targeted technological means are needed to deal with accuracy,unity and integrity of integrated data in data integration.This paper aims to conduct a research on key problems in the integration of unstructured data.Based on the description earlier,this paper gives a brief summary as follows.Firstly,this paper provides a way of image data annotation based on two-layer SimRank.In other words,most image data from social networks lose their labels during the process of data integration,so named entity recognition technology can be used here to extract candidate keywords of labels from users' image comments,and with the co-occurrence relation of images and labels,we can build up the bipartite graph,which,together with the SimRank algorithm based on graph model,can annotate images.SimRank is an iterative method,but to meet the requirements of large-scale data computing,an optimized strategy—image annotation algorithm based on two-layer SimRank is given.Secondly,this paper comes up with the computing method of entity similarity based on integral type and utilizes a variety of different similarity measurement methods such as "attribute characteristic","context",and "connection" to measure the similarity among representations.Also,different representations of the same data entity can be effectively identified and redundant information can be effectively simplified and unified,thus overall helping realize the entity resolution of all representations.Thirdly,this paper puts forward the characteristics of the unstructured data and the UDM(United data model)of its correspondent extended attributes.UDM is not only able to show the characteristics of the data itself,but has a full consideration of other data characteristics,for instance,data subjects,data interchange as well as data association so that we can have modeling for data services together,which lays a foundation for unified services of the unstructured data.Fourthly,this paper raises a personalized recommendation index based on K-nearest neighbor(PRI-KNN),which will take the initiative to offer users the newest data in line with their preferences.Taking the users' preferences and the correspondent data into consideration,we will use high-dimensional data as the form.In order to avoid the impact that "dimensional disaster" may have on personalized recommendation,anti-K-nearest neighbor and its related strategies will be given the first priority to adopt,in which we use PRI-KNN to have a quick search for target users with updated data and also,we have push services about data to its correspondent target users.
Keywords/Search Tags:Data integration, Image data annotation, Unified data entity, Data services model, Personalized data services
PDF Full Text Request
Related items