| With the semiconductor process technology evolving, the scale of chip design becomes larger and larger. Especially after the emergence of the deep submicron, the whole electronic system can be integrated on a single chip called SoC (System on Chip). SoC technology has become the development trend of VLSI design, which is the mainstream in the 21st century. Although SoC can reduce the design cost and shorten the design cycle, the technology integrating a lot of IP cores with various functions into a single chip is also confronted with great challenges in IC design, one of which is memory access performance. With the increasing number of IP cores integrated on the same single chip, we have to choose shared memory to reduce the cost. Then the performance of the memory access becomes one of the major bottlenecks of the performance improvement for the SoC. This paper will present several approaches to improve the performance of Juxin SoC's memory access based on the behavior of several testing programs of MiBench related to current applications.First, this dissertation analyzes the architecture of the high-speed on-chip-bus of Juxin SoC and the memory behavior of the masters on the bus. And then it presents a novel memory controlling architecture with DDR SDRAM according to the characteristics of the modern DRAM storage devices. This new architecture separates read and write interfaces of the high-speed bus in Juxin SoC from each other in order to support overlapping of read and write transfers. Meanwhile, it brings in a built-in operation queue to record the DDR SDRAM operations in order to support address pipelining of the bus. Furthermore, we introduce a parallel shared memory buffer to increase the operation response speed and the data throughput. At the same time, we design a scheduling algorithm according to the architecture of the operation queue.According to the experimental results on the new memory control system, the performance of it is much higher than that of the old one. Furthermore, the new one reduces the average latency of the memory access by 63.10% with only one godson-1 processor. To exert the high performance of the L*Bus, we add three more virtual masters to the L*bus. In the experiment with the new system, it reduces the latency of the memory access by 88.31% and expands the bandwidth by 14.86% in average. In order to trade-offs between latency, bandwidth and cost, a number of experiments are conducted under various circumstances, such as different amount of access, thresholds of reads and writes, and sizes of buffers and queues. |