As computation processing capabilities have outstripped memory transport speeds, memory management concerns have become more critical to the performance of matrix computation algorithms such as matrix multiplication. The inability of an algorithm to reuse array elements from cache, measured by cache misses, can result in significant computational expense as the necessary data must be transported from much slower memory sources. For a given algorithm and cache parameters, calculating the number of cache misses will yield an estimation for this transport cost. By using combinatorial techniques and insights, we will develop fast algorithms that can accurately estimate the number of cache misses in matrix multiplication for a wide range of data layouts. |