Fix oclgrind error on CLGEMMLowp reshaped only RHS quantized per channel

Fix corner case in which the quantization is per channel and OFM == 1
The function can safely set the per_channel quantization flag to false since there is only one output channel
This way, the kernel will avoid adding useless padding to the output multipliers and shifts

Resolve COMPMID-4384

Change-Id: Ic03452bfaf52d1be536cd371721adedd2e580a08
Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5648
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
diff --git a/src/core/CL/kernels/CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp b/src/core/CL/kernels/CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp
index d39900a..37c1100 100644
--- a/src/core/CL/kernels/CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp
+++ b/src/core/CL/kernels/CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel.cpp
@@ -256,7 +256,7 @@
             window_changed = window_changed || update_window_and_padding(win_out, bias_access);
         }
 
-        if(output_multipliers != nullptr && output_multipliers->dimension(0) > 1)
+        if(output_multipliers != nullptr && output_stage.is_quantized_per_channel)
         {
             AccessWindowHorizontal output_multipliers_access(output_multipliers, 0, num_elems_processed_per_iteration_x);
             AccessWindowHorizontal output_shifts_access(output_shifts, 0, num_elems_processed_per_iteration_x);