Implement CLDirectConv3D f32/f16

Resolve COMPMID-4660

Signed-off-by: Giorgio Arena <giorgio.arena@arm.com>
Change-Id: Ibd66ec1eb6faa60086981b1e3a9c12561df3445f
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6420
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
diff --git a/SConscript b/SConscript
index c88a867..6672cae 100644
--- a/SConscript
+++ b/SConscript
@@ -356,6 +356,7 @@
                     'src/core/CL/cl_kernels/nhwc/batchnormalization_layer.cl',
                     'src/core/CL/cl_kernels/nhwc/channel_shuffle.cl',
                     'src/core/CL/cl_kernels/nhwc/direct_convolution.cl',
+                    'src/core/CL/cl_kernels/nhwc/direct_convolution3d.cl',
                     'src/core/CL/cl_kernels/nhwc/depth_to_space.cl',
                     'src/core/CL/cl_kernels/nhwc/dequantization_layer.cl',
                     'src/core/CL/cl_kernels/nhwc/dwc_native_fp_nhwc.cl',