COMPMID-3732: Remove OpenCL padding from CLPoolingLayer

- Refactor pooling layer kernels on OpenCL (F32/F16/QASYMM8) to avoid
  padding and improve performance
- Add test for checking zero padding requirement
- Fix issue with extracting the index. The issue was caused by the
  padding passed at compile time
- auto_init indices tensor in CLPoolingLayerKernel

Change-Id: I1ae5a2ef8c4ce787c80dcd73e35c17bb34623cb5
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4188
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/CL/cl_kernels/helpers.h b/src/core/CL/cl_kernels/helpers.h
index 0b36a55..0bdf16d 100644
--- a/src/core/CL/cl_kernels/helpers.h
+++ b/src/core/CL/cl_kernels/helpers.h
@@ -174,7 +174,7 @@
  */
 #define V_OFFS1(dt) (dt)(0)
 #define V_OFFS2(dt) (dt)(0, 1)
-#define V_OFFS3(dt) (dt)(0, 1, 3)
+#define V_OFFS3(dt) (dt)(0, 1, 2)
 #define V_OFFS4(dt) (dt)(0, 1, 2, 3)
 #define V_OFFS8(dt) (dt)(0, 1, 2, 3, 4, 5, 6, 7)
 #define V_OFFS16(dt) (dt)(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)