COMPMID-3337: Remove write paddings in both axes from CLGEMMMatrixMultiplyReshapedKernel

- Change the interface of STORE_BLOCK_BOUNDARY_AWARE passing the
  conditions on Y and X rather than the X/ coordinates. This allows to
  use the macro with both GEMM reshaped and GEMM reshaped rhs only
- Remove padding from the output tensor of
  CLGEMMMatrixMultiplyReshapedKernel
- Add tests for validating the zero padding requirement

Change-Id: I13263cc71ce065c5be34ed198def320dd5823495
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/3712
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp b/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
index e65726b..cf77c70 100644
--- a/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
+++ b/src/core/CL/kernels/CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.cpp
@@ -162,7 +162,7 @@
                                      input0->dimension(0),
                                      input0->dimension(1));
     AccessWindowStatic input1_access(input1, 0, 0,
-                                     ceil_to_multiple(input1->dimension(0), num_elems_processed_per_iteration_x),
+                                     input1->dimension(0),
                                      input1->dimension(1));
     AccessWindowStatic output_access(output, 0, 0,
                                      output->dimension(0),