Improve start-up time for ClScale
- Add macro guard for different kernels in scale.cl
- Rework TENSOR4D to the new format
- Pass scale_x and scale_y at runtime
Resolves COMPMID-4886
Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: Ib904a703d511fb8260618057ac92e5ea9efeee2b
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/6619
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/CL/ICLKernel.h b/src/core/CL/ICLKernel.h
index 3b3217d..a7c979e 100644
--- a/src/core/CL/ICLKernel.h
+++ b/src/core/CL/ICLKernel.h
@@ -225,6 +225,24 @@
{
add_tensor_argument<4>(idx, tensor, window);
}
+
+ /** Add the passed NHWC 4D tensor's parameters to the object's kernel's arguments by passing strides, dimensions and the offset to the first valid element in bytes.
+ *
+ * @param[in,out] idx Index at which to start adding the tensor's arguments. Will be incremented by the number of kernel arguments set.
+ * @param[in] tensor Tensor to set as an argument of the object's kernel.
+ */
+ void add_4d_tensor_nhwc_argument(unsigned int &idx, const ICLTensor *tensor);
+
+ /** Returns the number of arguments enqueued per NHWC 4D Tensor object.
+ *
+ * @return The number of arguments enqueued per NHWC 4D Tensor object.
+ */
+ constexpr static unsigned int num_arguments_per_4d_tensor_nhwc()
+ {
+ constexpr unsigned int no_args_per_4d_tensor_nhwc = 9u;
+ return no_args_per_4d_tensor_nhwc;
+ }
+
/** Returns the number of arguments enqueued per 1D array object.
*
* @return The number of arguments enqueues per 1D array object.