Optimize Quantized/Integer Bilinear Scale for Neon™
This patch introduces several performance optimizations regarding the Bilinear Scale operator with REPLICATE Border mode. Changes apply only to NHWC.
This patch
- Reduces the memory footprint by disabling precomputation of indices and weights when they're not used
- Rewrites the kernels for QASYMM8/QASYMM8_SIGNED/U8(Uint8)
- Adds S8(Int8) Bilinear Scale for Border mode REPLICATE
- Removes Bilinear Scale SVE kernels for Quantized and Integer types and adjust the heuristics to choose the Neon™ implementation
- Adds new test cases where the input and output of the Bilinear Scale operator have different quantization scale and offset
Resolves: COMPMID-5453, COMPMID-5454
Change-Id: I3d251e76e0c6978fd5a0a1795ec62ab536bec93c
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8250
Reviewed-by: SiCong Li <sicong.li@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/utils/ScaleUtils.cpp b/src/core/utils/ScaleUtils.cpp
index 82c6405..ee57a8e 100644
--- a/src/core/utils/ScaleUtils.cpp
+++ b/src/core/utils/ScaleUtils.cpp
@@ -40,12 +40,26 @@
return static_cast<float>(in) / static_cast<float>(out);
}
-bool arm_compute::scale_utils::is_precomputation_required(DataLayout data_layout, DataType data_type, InterpolationPolicy policy)
+bool arm_compute::scale_utils::is_precomputation_required(DataLayout data_layout, DataType data_type,
+ InterpolationPolicy policy, BorderMode border_mode)
{
- // whether to precompute indices & weights
- // The Neon™ kernels (which are preferred over SVE when policy is BILINEAR) do not use
- // precomputed index and weights when data type is FP32/16.
- // If policy is nearest_neighbor for SVE, then precompute because it's being used
- // To be revised in COMPMID-5453/5454
- return data_layout != DataLayout::NHWC || (data_type != DataType::F32 && data_type != DataType::F16) || (CPUInfo::get().get_isa().sve == true && policy == InterpolationPolicy::NEAREST_NEIGHBOR);
+ // Do not calculate precomputed weights and indices if kernel code doesn't use them
+ if(data_layout == DataLayout::NHWC)
+ {
+ switch(data_type)
+ {
+ case DataType::F32:
+ case DataType::F16:
+ return (CPUInfo::get().get_isa().sve == true && policy == InterpolationPolicy::NEAREST_NEIGHBOR);
+ case DataType::U8:
+ case DataType::S8:
+ case DataType::QASYMM8:
+ case DataType::QASYMM8_SIGNED:
+ return (border_mode != BorderMode::REPLICATE) || (policy == InterpolationPolicy::NEAREST_NEIGHBOR);
+ default:
+ return true;
+ }
+ }
+
+ return true;
}
\ No newline at end of file