Optimize CPU add layer on quantized data

* Use fixed-point arithmetic where possible.
* Various optimization for the FP32-based implementation.
  This implementation is kept as the fall-back solution
  in case of unrealistic quantization parameters that exceed
  the range of fixed-point solution.

Resolves: COMPMID-5458
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I221d2d3801ecaae4fe0b7cf6ae8ef00ca3743665
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8317
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/cpu/kernels/CpuKernelSelectionTypes.h b/src/cpu/kernels/CpuKernelSelectionTypes.h
index e3ecc4e..87edb15 100644
--- a/src/cpu/kernels/CpuKernelSelectionTypes.h
+++ b/src/cpu/kernels/CpuKernelSelectionTypes.h
@@ -88,6 +88,7 @@
     DataType            dt;
     cpuinfo::CpuIsaInfo isa;
     bool                can_interpret_inputs_as_1d_array;
+    bool                can_use_fixedpoint;
 };
 
 struct ScaleKernelDataTypeISASelectorData