Font Size: a A A

Improvement On Aging Algorithm And Research On Its Application In LBCIS

Posted on:2011-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:P XuFull Text:PDF
GTID:2178360305955191Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Along With the development of internet applications, scientific computing, telecommunications, finance, and so on, the data is becoming increasingly large. Meanwhile, the local file systems'storage mode is very unitary and their performance is very limited. So, they have been insufficient when facing massive data access issues. In this case, the distributed file system came into being. In the distributed file system, most resources managed by the file system are not stored on the local node, but on the other nodes connected by computer network.The works done in this paper are part of a project belongs to the State Ministry of Science, Linux clusters for data retrieval system, and mainly on how to build the cluster system's internal file index system and its cache efficiency. In order to decrease company's costs, the free open-sourced software standards should be given priority to build the system. Being widely used and the industry standard, NFS (Network File System) from Sun Microsystems is a good option.In the designing, Linux clusters system for data retrieval have four layers in structure, namely, load balancing layer, service processing layer, cluster management layer and data layer. In load balancing layer, the load balancer is used to balance the load on the cluster system. Servers in service processing layer mainly deal with customers'requirements. Cluster management layer is mainly responsible for cluster management, such as switching the failure node. The major works in the paper are in the data layer. This layer could be divided into database servers and shared storage. Database server is used to provide data query interface, while the shared storage take charge of the data. They together provide to the server pool a unified query interface and a shared storage area, so that regardless of which one from the server pool to do data access, the same content and service could be provided.Data queries is executed by the database, meanwhile the database commit file accessing requests, so what the file index system needs to focus on is these file accessing requests. So functionally, the goal the file index system needs to achieve is a system like the distributed file system. Facing the higher layer, the system need to undertake the database's file accessing requests, while facing the lower layer, it need to do read and write operations to file storage server. There're mainly 3 parts in file systems under Linux, in which the most important one is VFS, the virtual file system switch. Situated in the middle layer of stand-alone file systems, VFS is a kernel software layer, and its function is to process all system calls related with the UNIX standard file system and provide a multipurpose interface for different file systems. VFS uses a multipurpose file model, and this model has enough capacity to represent all the file systems that VFS could support. Regard to the specific file system, its logic structure will be converted to the document model of VFS, so that these non-Unix-style file systems can be compatible with rules in UNIX file system and can be used by VFS. By using the idea from VFS for reference, a common file model was used in the cluster environment to unify the operations on different platforms. On the layer higher than VFS, LBCIS (Linux-Based Cluster Index System) was built to hide the details about how the files were stored against applications as database on the higher layer and all LBCIS needs is just a normal file accessing request. Facing the lower layer, LBCIS reorganize files on different machines and make them appear as a global storage space, so that information about the location of files was hidden.Because the amount of data couldn't be afforded by just one server, the database server and storage server need to be separated from each other. In order to solve the decline of accessing performance caused by the separation, which resulting from the speed's changing from bus and I/O to network, the cache mechanism has been introduced to improve the performance. In this system, the most noticeable feature is that it has much more query requirements than update requirements. Therefore, the most cost-effective approach is to put the greatest effort to pay attention to the cache server architecture and caching replacement rules. In this paper, load balancer managed multi-machine cache structure was chosen, and in this structure caching node could be added easily to enhance the caching performance. In the structure, caching request will be assigned by the load balancer to the cache server, and so only the load balancer need to be modified when adding or removing servers. By designing this way, caching system has been given extendibility and fault freedom. Load balancer could be given the task to deal with situations such as adding servers and server's crash down. Meanwhile, just the single instance caching mode need to be considered inside the system, so that the system's development complexity has also been reduced.Aging algorithm is a page replacement algorithm, and it is an excellent software approximation of LRU algorithm. Its main content is as following: collect reading bits of each page on each clock cycle and put the aged information to lower weight position through right shifting the counter, and then record the reading bits in the highest weighted binary bit of counter. Finally, choose the page that needs to be replaced by comparing the value in the counter. In this system, file accessing happens mainly between the database and the file system. When reading, database regard a file as a unit, so we should also following this to do caching. Aging algorithm is good at page replacement, but couldn't solve this situation without any changing, so it must be improved. First, to improve the aging algorithm, a file information table is needed to identify the files. In the file information table, there must be the file's unique identification, R-bit, and the aging information. As we considered the file size information, how to replace the file should be reconsidered. So we add the file's value H into the file information table, and then decide which one to be replaced by comparing the file's value H. The system need to build a unique global name space, so in the file information table, the full path of the file in the global name space is suitable to identify the file uniquely. Using the full path of the file as the key in hash searching method, the mapping address of the file could be found easily in the file information table. Also, the complexity of hash finding is at constant level, so the efficiency has also been ensured. In order to add the factor, size of the file, to aging algorithm, several kinds of methods about how to calculate between file size and aging information was tried, and finally a reasonable calculation method was found. Given a threshold value SIZE s, and make the file size does not count if the following file size is under SIZE s, the value H is calculated as follows:The formula added factor of file size into aging algorithm and inherited the thought of it. When calculate the influence caused by file size, shifting is still the method. Additionally, there's not too much information lost and the gap of value when processing files in different sizes is acceptable. In the end, the most important thing is that it is still using bit computing and integer comparing, the essence of the aging algorithm.Tested by Jilin branch of China Software Testing Center, the product of this project was proved to have stable operation and its performance index reached the design requirement. Meanwhile, the performance and the hit rate were guaranteed by improved aging algorithm.
Keywords/Search Tags:Distributed File System, Caching Algorithm, Aging Algorithm, LRU, Replacement Algorithm
PDF Full Text Request
Related items