Improve start-up timer for GeMM (floating-point):

 - Pass M,N,K at runtime as kernel parameters
 - Add a guard macro to compile only kernel of interest
 - Move reshpaing kernels to gemm_utils.cl
 - Remove the fallback reshaping kernel with Y-Padding support

Resolves: COMPMID-4888
Signed-off-by: Ramy Elgammal <ramy.elgammal@arm.com>
Change-Id: Ida3851326f0b77e410633271de9ecca106e37931
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6662
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/Android.bp b/Android.bp
index 32d5805..98f5237 100644
--- a/Android.bp
+++ b/Android.bp
@@ -40,6 +40,7 @@
         "src/core/CL/cl_kernels/common/floor.cl",
         "src/core/CL/cl_kernels/common/gather.cl",
         "src/core/CL/cl_kernels/common/gemm.cl",
+        "src/core/CL/cl_kernels/common/gemm_utils.cl",
         "src/core/CL/cl_kernels/common/gemmlowp.cl",
         "src/core/CL/cl_kernels/common/gemv.cl",
         "src/core/CL/cl_kernels/common/generate_proposals.cl",