Fix heuristic in Int8 CLDirectConvolutionKernel

- k0 should be set to 16 for quantized data types

Resolves COMPMID-4497

Change-Id: I729a8e2b7cd45762df4fef8b4c8606fe6367adb5
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5654
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/gpu/cl/kernels/ClDirectConvolutionKernel.cpp b/src/core/gpu/cl/kernels/ClDirectConvolutionKernel.cpp
index bf26477..18d648d 100644
--- a/src/core/gpu/cl/kernels/ClDirectConvolutionKernel.cpp
+++ b/src/core/gpu/cl/kernels/ClDirectConvolutionKernel.cpp
@@ -416,7 +416,7 @@
 
         const unsigned int n0                 = win_config.second.x().step();
         const unsigned int m0                 = win_config.second.y().step();
-        const unsigned int k0                 = adjust_vec_size(8u, src->dimension(channel_idx));
+        const unsigned int k0                 = adjust_vec_size(is_data_type_quantized(data_type)? 16u : 8u, src->dimension(channel_idx));
         const unsigned int partial_store_n0   = dst->dimension(channel_idx) % n0;
         const unsigned int pad_left           = conv_info.pad_left();
         const unsigned int pad_top            = conv_info.pad_top();