Font Size: a A A

Design And Performance Improvement Of High-performance Divider Based On SRT Algorithm

Posted on:2016-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:J H YuanFull Text:PDF
GTID:2348330536467375Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Under the new level of technology,it is very necessary to make in-depth research on division algorithm and design structure,in order to raise division performance,and improve the overall performance of the microprocessor.Under the guidance of this thought,this paper studied integer and floating-point division operations separately,proposed an optimized design structure,and made a comprehensive performance analysis,which was detailed below.First we realized an integer divider based on the SRT-16 algorithm,which could perform signed 64-and 32-bit integer division operation.According to the relational expression of quotient digits n = log2,it was known that when the SRT algorithm base value r = 16,each iteration could get four-digit quotient,which greatly accelerating the speed of the division operation.And in the 28 nm process conditions,we set a comprehensive set of voltage and temperature(0.9v,25?)to synthesize the integer divider,thus obtaining that the area of integer divider after integration was 39079?8)2 and the delay was 490 ps.Then we implemented the performance comparison with the gold model of integer division realized by the SRT-4 algorithm,and found that the performance increased 27% when ignore the area factor,only consider the delay and the power factor.Secondly,we designed and realized a floating-point divider respectively based on the SRT-8 algorithm and the SRT-16 algorithms.We made great improvements for the mantissa processing of floating-point values relative to the other floating-point divider based on the SRT algorithm.First,we combined the SRT algorithm with the restoring algorithm idea in the digital loop algorithm from the perspective of algorithm,and restricted the quotients within the non-negative range to avoid normalizing process when the quotients were negative values;secondly,we set part of remainder calculation as parallel action within the range of full quotient values,used the carry-save adder to improve the operation speed,and selected the quotient value of this iteration based on the sign of part of the remainders.In the 28 nm standard cell library,we defined the operating voltage and temperature(0.9v,25?),and made synthesis respectively for both floating-point divider.The SRT-8-based floating-point divider after synthesis had an area of 13379 ?m2,with a delay of 471 ps.The SRT-16-based floating-point divider after synthesis had an area of 23951 ?m2,with a delay of 517 ps.Compare to the same structured floating-point divider realized by the SRT-4 algorithm,The radix 8 floating-point divider and the radix 16 floating-point divider had the delay optimization of 19% and 29% respectively.Finally,we designed and realizeded the shared structure floating point divider based on the SRT-16 algorithm,which could perform division operation or extract iterated operation of four groups of single-precision floating-point data simultaneously.The priority setting principle of data groups was time,the earlier the data group entered into the divider,the higher its priority.Then we designed the quotient range look-up table and the mantissa processing shared structure.Because the iterative process part was a shared structure,the quotient value range coincidence of data groups would lead to competition.At this time,the competition would be handled depending on the priorities of the data groups.The data group at a higher priority would be calculated at higher priority.Finally,we observed the time of shared structure floating-point divider and traditional SRT-16 floating-point divider in processing the random data of the same number and the same type.The shared structure floating-point divider reached the average operation of clock cycles per instruction of 4.12 for processing of a single group of data,and the traditional floating-point divider reached the average operation of clock cycles per instruction of 10.23,which hit the target of improving the utilization and throughput of the divider on the basis of reasonable consumption increase on hardware.
Keywords/Search Tags:SRT Algorithm, Integer Divider, Floating-point Divider, Shared Structure Floating-point Divider
PDF Full Text Request
Related items