Implement CLDirectConv3D f32/f16

Resolve COMPMID-4660

Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
diff --git a/src/gpu/cl/kernels/ClDirectConv2dKernel.h b/src/gpu/cl/kernels/ClDirectConv2dKernel.h
index 4041c7b..5624f3a 100644
--- a/src/gpu/cl/kernels/ClDirectConv2dKernel.h
+++ b/src/gpu/cl/kernels/ClDirectConv2dKernel.h
@@ -42,9 +42,9 @@
     ARM_COMPUTE_DISALLOW_COPY_ALLOW_MOVE(ClDirectConv2dKernel);
     /** Set the src, weights, biases and dst tensors info.
      *
-     * @note: Due to set_valid_region(), thus src/weights/biases cannot be const. Need to change this once the set_valid_region() is removed.
+     * @note: Due to set_valid_region() in NCHW, src/weights/biases cannot be const. Need to change this once the set_valid_region() is removed.
      *
-     * @note: DirectConvolution only works in the following configurations:
+     * @note: DirectConvolution only works in the following configurations for the NCHW data layout:
      *        1x1 convolution with stride_x = 1/2/3, stride_y = 1/2/3
      *        3x3 convolution with stride_x = 1/2, stride_y = 1/2
      *        5x5 convolution with stride_x = 1/2, stride_y = 1/2