Fix OpenCL direct convolution

- The ARM DOT macro was using wrong variables for performing the dot
  product
- K0 could be a non power of 2 values when IFM was not a multiple of 16
- Refactor the test for direct convolution NHWC

Resolves COMPMID-4135, COMPMID-4155

Change-Id: I3a2dc89ef613ae20245cfc28e76ea36c55eaf81d
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4962
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: TeresaARM <teresa.charlinreyes@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
diff --git a/tests/datasets/ShapeDatasets.h b/tests/datasets/ShapeDatasets.h
index f8c0309..a7f1a44 100644
--- a/tests/datasets/ShapeDatasets.h
+++ b/tests/datasets/ShapeDatasets.h
@@ -683,7 +683,7 @@
         // Batch size 1
         TensorShape{ 32U, 37U, 3U },
                      // Batch size 4
-                     TensorShape{ 32U, 37U, 3U, 4U },
+                     TensorShape{ 6U, 9U, 5U, 4U },
     })
     {
     }