This thesis presents techniques for deducing valuable array shape information in typeless array-based languages like MATLAB, even when the shapes may be statically unknown. The approach is based on an algebraical modelling of the language's array shape semantics.;The dissertation makes five primary contributions: (1) it shows how to model shape semantics so as to enable precise characterizations that are independent of an array's dimensionality, (2) it presents a rewriting methodology that uses the characterizations to symbolically determine an array expression's shape in O(R||s||3 log||s||) time, where ||s|| is the number of nodes in a rooted ordered tree representation of a shape term s, and R is an upper bound on an array's canonical rank, (3) it presents a shape-tracking optimization that detects equivalent shapes, permitting a translator to eliminate redundant shape checks, and to generate specialized code even when the shapes may be compile-time unknowns, (4) it shows a preallocation optimization that determines the set of all possible shapes that an expression may assume during program execution; and (5) it presents an interference graph-based algorithm that uses type and control-flow information to coalesce array storage. The dissertation also substantiates the techniques by presenting performance numbers of an implementation on a benchmark suite. |