Add SVE support and decouple data type for NEScaleKernel

- Decouple data type for NEON NHWC implementation, supported data types are: fp32, fp16, u8, s16, qasymm8, qasymm8_signed.

- Add SVE support for NHWC and all six data types showed above.

Resolves: COMPMID-3873

Change-Id: I097de119f4667b28b025a78cadf7185afa5f15f0
Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4766
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/helpers/ScaleHelpers.h b/src/core/helpers/ScaleHelpers.h
index 827bbef..f19a8b8 100644
--- a/src/core/helpers/ScaleHelpers.h
+++ b/src/core/helpers/ScaleHelpers.h
@@ -1,5 +1,5 @@
 /*
-* Copyright (c) 2020 Arm Limited.
+* Copyright (c) 2020-2021 Arm Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -325,6 +325,32 @@
     // Return average
     return sum / (x_elements * y_elements);
 }
+
+/** Computes bilinear interpolation using the top-left, top-right, bottom-left, bottom-right pixels and the pixel's distance between
+ * the real coordinates and the smallest following integer coordinates.
+ *
+ * @param[in] a00 The top-left pixel value.
+ * @param[in] a01 The top-right pixel value.
+ * @param[in] a10 The bottom-left pixel value.
+ * @param[in] a11 The bottom-right pixel value.
+ * @param[in] dx  Pixel's distance between the X real coordinate and the smallest X following integer
+ * @param[in] dy  Pixel's distance between the Y real coordinate and the smallest Y following integer
+ *
+ * @note dx and dy must be in the range [0, 1.0]
+ *
+ * @return The bilinear interpolated pixel value
+ */
+inline float delta_bilinear(float a00, float a01, float a10, float a11, float dx_val, float dy_val)
+{
+    const float dx1_val = 1.0f - dx_val;
+    const float dy1_val = 1.0f - dy_val;
+
+    const float w1 = dx1_val * dy1_val;
+    const float w2 = dx_val * dy1_val;
+    const float w3 = dx1_val * dy_val;
+    const float w4 = dx_val * dy_val;
+    return a00 * w1 + a01 * w2 + a10 * w3 + a11 * w4;
+}
 } // namespace scale_helpers
 } // namespace arm_compute