Improve start-up time for ClScale

- Add macro guard for different kernels in scale.cl
- Rework TENSOR4D to the new format
- Pass scale_x and scale_y at runtime

Resolves COMPMID-4886

Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/CL/ICLKernel.h b/src/core/CL/ICLKernel.h
index 3b3217d..a7c979e 100644
--- a/src/core/CL/ICLKernel.h
+++ b/src/core/CL/ICLKernel.h
@@ -225,6 +225,24 @@
     {
         add_tensor_argument<4>(idx, tensor, window);
     }
+
+    /** Add the passed NHWC 4D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes.
+     *
+     * @param[in,out] idx    Index at which to start adding the tensor's arguments. Will be incremented by the number of kernel arguments set.
+     * @param[in]     tensor Tensor to set as an argument of the object's kernel.
+     */
+    void add_4d_tensor_nhwc_argument(unsigned int &idx, const ICLTensor *tensor);
+
+    /** Returns the number of arguments enqueued per NHWC 4D Tensor object.
+     *
+     * @return The number of arguments enqueued per NHWC 4D Tensor object.
+     */
+    constexpr static unsigned int num_arguments_per_4d_tensor_nhwc()
+    {
+        constexpr unsigned int no_args_per_4d_tensor_nhwc = 9u;
+        return no_args_per_4d_tensor_nhwc;
+    }
+
     /** Returns the number of arguments enqueued per 1D array object.
      *
      * @return The number of arguments enqueues per 1D array object.