Font Size: a A A

Research On Some Key Technologies Of Big Data-as-a-Service

Posted on:2014-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HanFull Text:PDF
GTID:1268330401963109Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, Big data has become an important direction of development of modern information technology, and sharing and analysis of big data would not only bring immeasurable economic value, but also play a significant role in promoting the development of society. Big Data-as-a-Service (BDaaS) is a new data resource usage pattern and a new form of service economy, by encapsulating heterogeneous data, it can provide ubiquitous service consumers, standardization, on-demand services, including search, analysis or visualization.Due to the research of BDaaS is in the conceptual discussion stage, it still faces four challenges:1)There is no standardized, user experience based BDaaS architecture which can shield the complexity of data sources and operations;2)The lack of generic unstructured data model which reflects user behavior characteristic, made BDaaS for unstructured data difficult to build;3)Existing data model follows the Web services model, however, so far, holistic BDaaS service model with the characteristics of big data has not yet appeared;4)There is no appropriate solution in providing data retrieval, analysis and visualization services and optimizing service capacity.In order to solve the above problems, four key technologies of BDaaS architecture, data model, BDaaS service model, as well as BDaaS applications will be in-depth study. Firstly, this paper designed a User Experience-oriented BDaaS Architecture, so as to provide a high level of standardization guidance for building a platform. Secondly, in terms of the data model, in order to unify description unstructured data, the user behavior-based unstructured data model has been designed. Thirdly, in terms of the service model, algebraic model has been established by using process algebra, and then extended OWL-S ontology-based BDaaS model and the service composition approach were designed. Finally, service processes of retrieval, analysis and visualization have been described in detail, and the two measures of improving the retrieval services accuracy and service efficiency have been used to optimize the BDaaS capacity.The main innovations points of this paper are show as follows:(1) As existing unstructured data models were difficult to meet the demand for BDaaS, the Galaxy Data Model (GDM) has been proposed, which is a user behavior based unstructured data model. By monitoring the behavior of data generator people, a generic model with fully attributes like user behavior, semantic background have been created, which is the basis for the realization of the BDaaS for unstructured data. The case study shows GDM not only has good versatility and comprehensiveness, but also has a lightweight, easy-to-use description language and operating language. In addition to the traditional file system, GDM also supports modeling and retrieval of unstructured data in HDFS. In addition, GDM has application in the National Pre-pregnancy Check Information Management System (NPCIMS) to verify its feasibility and practicality.(In chapter three)(2) Due to the holistic BDaaS service model with the characteristics of big data has not yet appeared, Extended OWL-S based Big Data-as-a-Service model(EO-BDaaS) has been proposed. By add properties of the data sources, data types, service operation in the OWL-S in order to build many types of BDaaS such as search, analysis, visualization, and to compose service dynamically. Case study shows, compared with the existing data services, EO-BDaaS with a more comprehensive description on attributes and operations. Besides, it has capabilities such as strong semantic comprehension and automatic service composition, and integrated the unique combination operations of BDaaS into the implementation of data services seamlessly.(In Chapter four)(3) To solve the problem of low accuracy of retrieval services, this paper presents the heat sensitive unstructured data retrieval ranking algorithm HotRank. First heat score was calculated, which is the match degree between the tasks attributes of search results and task attributes of services consumers, after that assigned the scores to each of the search results, and then sorted search results based on heat score. By using such means to make search results more in line with the preference of the user. The simulation results show that, the Precision-Recall of HotRank is better than Windows Search ranking algorithm. Therefore as the improving of retrieve accuracy, HotRank is able to optimize not only the user experience, but also the service capacity.(In Chapter five)(4) A data heat recognition-based Hybrid Prefetch Algorithm (HPA) has been proposed to meets the quickly respond requirements of the BDaaS. First, by analyzing the log of user data operation and develop data heat determine rules, then according the dynamic and static prefetch rules to get candidate data, finally prefetch data would be take into the cache. The simulation results show that average hit rate of HPA is55%, the average accuracy rate of HPA is43%, which indicates that the algorithm not only has good ability to predict user operation of data, but also to optimize the BDaaS capacity. In addition, HPA-based Hybrid Prefetch based Persistent Caching architecture has been applied in the National Pre-pregnancy Check Management Information System (NPCMIS) in order to verify its effectiveness.(In Chapter five)The research content of this thesis, as the academic achievements of National Key project of Scientific and Technical Supporting Programs "Research on a safe, reliable, carrier-class operation support system of reproductive health services"(No.2008BAH24B04) and Science Foundation of Ministry of Education of China-China Mobile Program "Research on key technologies and solutions of internet-oriented business support system"(NO.MCM20123031), has been applied in NPCMIS and help them achieve the evolution from data acquisition to BDaaS. In addition, it has provided "The National Cloud Computing E-Government BDaaS platform" of National Engineering Lab of Cloud Computing E-Government with an effective solution and project practice guidance.
Keywords/Search Tags:Big data-as-a-Service, unstructured data, DataModel, Service Model, Search ranking algorithm
PDF Full Text Request
Related items