This is a set of 2 script tools for tuning the performance of OpenCL GEMM kernels (limited to Convolution layer functions only for now). Specifically, we tune 3 GEMM kernels, each has a different implementation strategy of the GEMM operation: native, reshaped, reshaped only rhs. The details of these strategies can be found in the documentations of the corresponding kernels: CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
The outputs of the tuning process are 1 optimal configuration (called GEMM Configuration or GEMMConfig) for each of the 3 strategies.
This section gives a brief description and rationale of the approach adopted by the current version of GEMM Tuner.
As explained in the Introduction section, the outputs of the tuner are 1 optimal GEMMConfig for each strategy. This is because we can only integrate 1 GEMMConfig for each strategy in ACL at compile time. In theory, however, the optimal GEMMConfig also depends on different parameters of GEMM (called GEMM Parameter or GEMMParam, e.g.: the shape of the operation); thus ideally, for each strategy, the optimal configurations should be a mapping from GEMMParam to GEMMConfig instead of a single GEMMConfig.
To address this issue, we ensure the one single optimal GEMMConfig can generalise well to all potential GEMMParams (or at least the ones that we care about). The approach we adopt involves a preliminary stage where a collection of common GEMMParams (GEMM shapes from popular networks) are compiled. Then, to reduce the final tuning time, rather contradictorily, we spend a lot of time searching for near-optimal GEMMConfigs for each GEMMParam first, and then discard redundant GEMMParams which share similar optimal GEMMConfigs with others. The resultant list of GEMMParams is called a GEMMParam archetype list, as in these GEMMParams are typical enough to capture the space of GEMMParams that we care about.
During this preliminary stage we also produce a list of good GEMMConfigs that can be used to search for the optimal one in the actual tuning stage. This, again, is to reduce the tuning time, and the resultant list is called a GEMMConfig search list.
The GEMMParam archetype list and the GEMMConfig search list are investigated and prepared by the developers; the users of GEMM tuner need not worry about producing them, but they need to obtain them prior to running the tuner.
Once these two lists (2 for each strategy, so 6 in total) are obtained, they can be fed to the tuner, to produce the optimal GEMMConfig(s).
The tuning stage consists of 2 steps:
[$SHELL] ./benchmark_gemm_examples.sh -s \<strategy\> -e \<example_binary_dir\> -g \<gemmparam_archetype_list\> -c \<gemmconfig_search_list\> [-o \<out_dir\>]
python GemmTuner.py -b \<benchmark_results_dir\> [-t \<tolerance\>] [-o \<out_dir\>]