Font Size: a A A

Research On Unified Access Plantform For Unstructured Data And Index Technology

Posted on:2011-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2178330332978471Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In many enterprise information systems, there are a lot of unstructured data with various forms. The data are processed, stored and used in different ways in different departments of an enterprise. In order to obtain valuable information from the data, an enterprise has to integrate these dispersed and heterogeneous data efficiently, which requires a unified method to access the data. How to integrate unstructured data and how to access the data in a unified way are key issues in enterprise information systems.The thesis targets the problems of integration of unstructured data and unified data access. Based on the Hadoop distributed computing framework, we use indexing technology to build a unified view to unstructured data. The main contributions are as follows.By improving scheduling Algorithm of Hadoop in heterogeneous environment, the efficiency of heterogeneous environments is increased. In the process of unstructured data integration, the analysis of the amount of information and timeliness of data problems improves the document right value method. Based on the concept of similarity cosine of making non-training set classification algorithm, a category index based on classification index is constructed. According to business information processing of information in the horizontal analysis of the needs of the program, the proposition of the supplementary query is made.Finally, the prototype system are realized and tested, it shows that the prototype system can better integrate unstructured data to achieve unified access and supplemental query.
Keywords/Search Tags:Unstructured data, Hadoop, Data integration, Unified access, Category index
PDF Full Text Request
Related items