Model-based Input-adaptive Vectorization

Abstract

In a program, not all the bits of a variable are always used during execution. Identifying the minimum number of bits necessary to represent a variable in a program can potentially provide optimization opportunities. Providing the knowledge of bitwidths to a compilation and execution framework will be advantageous if it could use that information to optimize the execution of the program, for instance, being able to select instructions for SIMD vectorization. This paper introduces a framework to exploit the potential vectorizations hidden in a program which is not exposed during static compilation time. Our framework unlocks instruction level data parallelism by using the bitwidths of array like variables that depend on runtime input. Our framework shows a maximum achievable performance gain of 37% and a mean achievable performance gain of 11% against the ICC compiler on our micro benchmark suite.