Port DirectConv2d to CKW backend

Ports the direct convolution 2D kernel from the experimental Dynamic Fusion interface to use the new Compute Kernel Writer backend for OpenCL code generation.

Support is for FP16/FP32 only.

Resolves: COMPMID-6259

Change-Id: Ia8d7b9cb789737b22b1d877cd798a73eda0ce4ab
Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10059
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/compute_kernel_writer/prototype/include/ckw/KernelWriter.h b/compute_kernel_writer/prototype/include/ckw/KernelWriter.h
index c116e62..72f85c7 100644
--- a/compute_kernel_writer/prototype/include/ckw/KernelWriter.h
+++ b/compute_kernel_writer/prototype/include/ckw/KernelWriter.h
@@ -129,11 +129,38 @@
 
     /** Load the data from the tensor memory to the tile using the sampling information.
      *
+     * @param[out] tile       The tile to be loaded.
+     * @param[in]  tensor     The tensor to be read.
+     * @param[in]  sampler    The tensor sampling information.
+     * @param[in]  dilation_y Dilation in the Y dimension.
+     */
+    void op_load(TileOperand &tile, const TensorOperand &tensor, const TensorTileSampler &sampler, const TileOperand &dilation_y = TileOperand("dil_y", 1));
+
+    /** Load the data from the tensor memory to the tile using the indirect buffer approach and respective of the sampling information.
+     *
      * @param[out] tile    The tile to be loaded.
      * @param[in]  tensor  The tensor to be read.
      * @param[in]  sampler The tensor sampling information.
      */
-    void op_load(TileOperand &tile, TensorOperand &tensor, const TensorTileSampler &sampler);
+    void op_load_indirect(TileOperand &tile, const TensorOperand &tensor, const TensorTileSampler &sampler);
+
+    /** Construct an indirection buffer in @p tile containing the precalculated addresses of elements in the source tensor.
+     *
+     * @param[out] tile    The tile to be loaded.
+     * @param[in]  tensor  The tensor the be read.
+     * @param[in]  sampler The tensor sampling information.
+     * @param[in]  x       The X coordinate.
+     * @param[in]  y       The Y coordinate.
+     * @param[in]  x_off   Offset in the X dimension.
+     * @param[in]  y_off   Offset in the Y dimension.
+     */
+    void util_get_indirect_buffer(TileOperand             &tile,
+                                  const TensorOperand     &tensor,
+                                  const TensorTileSampler &sampler,
+                                  const TileOperand       &x,
+                                  const TileOperand       &y,
+                                  const TileOperand       &x_off,
+                                  const TileOperand       &y_off);
 
     /** Store the tile to the tensor using the specified sampling information.
      *