Rework OpenCL Depthwise Convolution

- Remove dedicated kernels for NCHW. Now we only use NHWC with permute
- Remove specialized kernels for 3x3 NHWC
- Simplify CLDepthwiseConvolutionLayer.cpp to call just the native
  implementation for both floating-point and quantized data types
- Develop two parametric opencl kernels for depthwise convolution layer NHWC
  (floating-point and quantized)
- Add support to export the weights to cl_image
- Extend test for depthwise convolution on opencl

Resolves COMPMID-4417

Change-Id: Ibe533f79c2860f9cac8e921895d5a8f947753a5c
Signed-off-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5893
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/arm_compute/core/CL/CLHelpers.h b/arm_compute/core/CL/CLHelpers.h
index a9ac6a5..180211c 100644
--- a/arm_compute/core/CL/CLHelpers.h
+++ b/arm_compute/core/CL/CLHelpers.h
@@ -236,5 +236,13 @@
  */
 void set_wbsm(cl::Kernel &kernel, cl_int wbsm_hint);
 
+/* Helper function to check if we can export the weights to cl_image
+ *
+ * @param[in] tensor Weights tensor
+ *
+ * @return true if we can export the weights to cl_image
+ */
+bool export_weights_to_cl_image(const ITensorInfo *tensor);
+
 } // namespace arm_compute
 #endif /* ARM_COMPUTE_CLHELPERS_H */