The Research On The Key Technology Of Optimizing Massive Geological Data's Search

Posted on:2009-03-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Chen

Full Text:PDF

GTID:1118360245963224

Subject:Earth Exploration and Information Technology

Abstract/Summary:

PDF Full Text Request

As the improvement of the information degree and the deep exploration of geological explorations work, the geological database usually need to manage hundreds of GB even TB of massive data. To get the most information resources from enormous and complex data, we must use the corresponding technologies to simplify data storage, organize and inquiry work.The search of massive data become a bottleneck of restricting information deepened, nowadays, the field of providing with the certain information's degree has its own database. Making use of database can carry out the search. When the data achieves the huge number, the condition of serach achieves the certain number and a lot of people serach on the same time, it will take a amount of time to find the required data from the database. If we spend plenty of time in searching the data, it will cause the great cost of time. If improving the speed of searching the data, we may use a large number of index's configure on the database and upgrade the hardware extently, so that it will cause the great cost of equipment.So, we need some technologies to solve the quick required questioned of massive data from the practiced point of view.Due to the extensity of geographical area and the diversity of geological data, the number of geological data is more than that of the businesslike information system. As the extensive application of the Satellite and the technology of Remote Sensing,the rich space data and non-space data will be collected and stored in the database. In certain degree, massive geological data has already oversteped people's ability of dealing with. In the past fivety years, the government invested a large number money in seting up more than 100 countrywide gelogical database, and it's total data achieve more than 100TB. In the area of geological applications, data query and analysis take up the largest proportion. Therefore, the optimizing on massive geological data inquiry becomes the key of geological Applications.This paper summarizes the current status of the basis geological database, indicate such issues that the existence of no uniform standard for creating database and the huge volume of data, analysis the current use of the solutions, such as: improving the hardwar's equipment, increasing servers, creating the efficient and reasonable SQL statements and so on, indicate the advantages and disadvantages of their respective programs. This paper analysis and research the critical technologies of Table Partition, the Decision Tree optimization, the optimizing of XML, and application of research results to the optimizing test system of Massive geological data's inquiring and proved the Optimization results.To the massive geological datatable while it achieves some hundred GB, even more than some TB, the reasonable partition design to table's stucture will improve the required performance radically. Partition Table plots out table's stucture according to the given method and logic, and spread the data to the many small sub-partitions. The sub-partition can memory dividely in physics. That is: different sub-partition data will store to different data files. The style of data classified store and uniform management let us to only scan in the given range of physics sub-partition rather than to scan the whole table's data when we search the data of given logic scope. This way will reduce the time of reading and writing on disk and improve the performance.After analysis of the working principle of Table Partition, this paper gives the differentfour types of partition ,such as: range partition,list partition,hash partition and composite partition's principle and algorithm, focused on range partition,list partition and range-list partition how to optimize the search to massive geological data, and gives enquiries Comparison Chart, and image intuitive note this technology in the optimization for speed's strengths.Data Mining increased geographic information system analysis capabilities, but requires extract useful information from so much data, this demands a higher speed for the massive geological data's query. Decision-Tree is a kind of data mining technology.We usually use the Decision-Tree's algorithm in forecasting the geological mine, as geological data is too much enormous and complicated.Therefore, it takes us a lot of time and engegy in the course of using the the Decision-Tree to forecast. After analysising the working principle of Decision-Tree, this paper refers many Pruning algorithms which adopted these days how to optimize the Decision-Tree seperately. Among these Pruning algorithms, we pick up the Id3 algorithm to rearch deep and improve properly, then and give a sample. So we improve the efficiency which forms the Decision-Tree from the massive geological data. This course of optimizing the Decision-Tree is comprehensive which includs the deal of data, the optimizing of algorithm and the deal of realizing the program.To exchange data through XML carry out the data of sharing and standardization, but the speed of massive geological data query in XML format become a bottleneck. The XML document which picking up from massive geological attributive database is too large.The speed of searching the XML document doesn't achieve the ideal effect all the time. So, optimizing the search Performance becomes a difficult problem which we must settle nowadays. When we search the XML document, we not only make good use of information which contain in each node's labels but also make good use of the path information among the each node element, that is the relation of each node. From the tree stucture picture which switchs from the XML document, we can find that all of the information which store in the leaf node.also we find that information access is the course which the root node spread to the leaf node along the certain path. Therefore, we can make good use of path -index to improve the searched speed on searching the XML document. Purposing on the rich semantic information and the stucture's trait of the XML document, combining with the thought of the node path-index, my paper offer that using the Structure of Binary Tree creates the index on the XML document in order to improve the searched proformance. Also giving a sample which propuse on the geological attributive datatable illstrates the method how to optimize the search.Finally, this paper use C# language to develop the optimizing test system of massive geological data's search on .net framework platform, several key technologies will be used in the system, choosing the huge data geological table to test, through comparing searched time between the non-optimized data and the optimized data. Practice has proved that all technologies reached the purpose for optimizing massive geological data's search.

Keywords/Search Tags:

massive geological data, table partition, decision-tree, ID3 algorithm, XML

PDF Full Text Request

Related items

1	Research On Parallel Decision Tree Algorithm Based On Spark
2	Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree
3	Research On The Decision-Tree-Based Prediction System Of Massive Time-Serial Unbalanced Data
4	The Fuzzy Decision Tree Algorithm Based On Dynamic Partition Of The Feature Space
5	Study Basing On The Disc Partition Data Restores Technology
6	Research On Visualization Technology Of Multi-attribute Massive Geological Exploration Data
7	Data Analysis Methods For Inconsistent Decision Tables
8	Research On Sort Algorithm On Massive Data Of Two-Dimension Table
9	A Comparative Study On Five Decision Tree Algorithms
10	A Nonlinear Integral Defined On Partition Of Set And Its Application To Decision Tree Algorithm