Font Size: a A A

Application And Implementation Of Column-oriented DBMS In Discovery System

Posted on:2011-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:D F YinFull Text:PDF
GTID:2178360305454399Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the network,all kinds of forms of socializing have focused thousands of Internet users statistics, it is very important to realize in real-time queries in the mass data and find the potential of business information among the statistics accurately. Now, all categories of knowledge discovery system have been applied effectively in data statistics, analysis and summary. And have raised the service quality of the service provider. However, with the increase of knowledge, it becomes a trial for the study of knowledge to achieve batch processing and real-time query in mass data storage.In the traditional application domain, knowledge storage of the knowledge discovery system uses the relational database in general, and it's limited in query efficiency. With the rapid development of the Internet, the information in the database increased, and the traditional line type database is faced with severe challenge in dealing with the mass data analysis. Long-term since, the traditional relational database just found a large range of applications in the system when trading system and analyzing system. But now, the traditional relational database is facing the challenge of two kinds of applications: one kind is that when many transactions happen, it always meet bottleneck, because data stored in the slow access equipment like disk, in this case, the memory database is more suitable. The other kind is mass data analysis, this is a more difficult task for the relational database, and it can't satisfy the need for commercial business inquiries. Mass data is now the development trend of information store for all kinds of service providers. So it's very important and urgent for analyzing mass data and excavating useful information.In order to improve the speed of relational database query, industry research focuses on improving parallel full table scan, more intelligent search technology, and compression search of the compression objects, and added functions in the database to reduce the processor cycle, use materialized view for computation. However, the maintenance of index and materialized view has high cost in the time (processing) or space (storage). Although this method in the foreseeable visit circumstance is very effective, it cannot satisfy the real application of random access, in this situation a full table scan is still the only solution. In order to solve these problems, research institutes, open-source organizations and manufacturers have launched research of column type database, and the column type database is list related storage architecture, mainly suitable for queries with batch data processing and casual queries.Line type database focuses on writing and concurrent, column type database focuses on mass analysis and reading and can solve the higher frequencies' operation. As the data of the column type database is stored in column, data type is the same, and data features are also similar, so it can compress data efficiently. At the same time, this compression is not only hasn't loss of performance, but also as using large range of compression, its I/O amount reduced. Through the column type storage and strong compression, the performance has greatly improved. The column type database has unique,non-replaced status in data analysis,mass storage,and business intelligence, it can solve the problem which line type database can't solve.P2P is an Internet share technology which has developed in recent years. Based on the technology of P2P file-sharing services have become one of the main applications. At the time of the rapid growth of P2P in the Internet, it also bring problems in complex,mass system information statistics. At the time of the traditional and based line type storage database facing rapid growth of P2P statistics information, it has proved powerless. However, there are a lot of valuable information and unlimited business opportunities hiding in the mass statistics information. It is one of the efficient ways for the column type database applied in P2P knowledge discovery system.The content of this paper is around with a target that how to realize rapid knowledge discovery of mass data, on the base of traditional knowledge discovery technology, the research is around with design and application of knowledge discovery system which based on column type storage and apply database to practical knowledge discovery system, this has realized P2P users access records' analysis and inquires, and improved the performance of system, and solved mass data analysis problems which the traditional line type database's knowledge discovery system can't solve.This paper has done:1) It introduced the basic theory and application of knowledge discovery in detail; stated database application in the knowledge discovery system; and stated the current research situation in the data storage field; analyzed the differences between line type storage and column type storage; and pointed out that the column type storage has been one of development trends in data storage fields.2) On account of the mass data analysis problems which traditional line type storage's knowledge discovery system can't solve it, this paper designed a module frame which based on the column type storage's knowledge discovery system, and gave the function modules under this frame, which including data visualization module,databases,database interface module,data pretreatment module,data mining,knowledge,knowledge edit module,results display module and user interfaces.3) This paper gave the detail interface design of data storage based on the column type storage's knowledge discovery system. In order to improve the efficiency of data index, in the design of database interface the design adopts the connection pool technology. This design realized the concurrent access to database. At the same time, it also gave acceptable distributed deployment scheme which based on the column type storage's knowledge discovery system, and this scheme can further improve the performance of system and complete large scale data index and knowledge discovery better.4) Apply the knowledge discovery system module which based on the column data storage successfully to the practical system. And developed a P2P user access record analysis system. This paper stated the UML module of the system in detail, including sequence diagram of the main functions of system and class diagram of system key class, at the same time, it showed the working sketch of system in the actual application, and explained design scheme acceptance of the knowledge discovery system based column type storage.5) At last, in the same application scene, it analyzed the MonetDB database's performance which based on column type storage and MySQL database which based on line type storage, on the time of data inquiry, it further stated the system superiority in data inquiry and analysis, this system can adapt mass users access record information's inquiry and analysis in P2P system.The research of this paper provided technical support for instant inquiry in the practical application environment to realize mass data, and from the data storage aspect it made initial attempt, and the P2P user access record analysis system provided important technical support for related research in P2P fields.
Keywords/Search Tags:knowledge discovery, column-oriented DBMS, P2P system, system design
PDF Full Text Request
Related items