Fix quant. gemv kernel driver by adding set_quantized_bias()

arm_gemm fuses the actual bias addition with the output stage in quantized gemm.

The output stage, in its very basic form, is:
	A_offset * B_offset - sum(A_row_i) * B_offset - sum(B_col_j) * A_offset

Matrix B is usually constant (e.g. weight matrix in convolutions). Therefore, except the middle term above, the expression is constant across the same output row because the column sums of matrix B are pre-calculated.

The bias is also usually constant. When it is, it makes sense to add the bias vector to the above sum and just perform a single addition on top of the output tensor.

For this to happen, the column sum computation of B tensor must account for the bias. This is ensured by set_quantized_bias() method in the interface. This function passes the bias pointer and strides to arm_gemm.

Gemv_pretransposed does not implement set_quantized_bias() and uses the parent function, which does nothing. Therefore, the bias is not added to the output. This causes tests to fail.

Resolves: COMPMID-6928
Change-Id: Iba24fabc65fdc47edb12db6abff2fb47784c0743
Signed-off-by: Gunes Bayir <gunes.bayir@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11310
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
diff --git a/src/core/NEON/kernels/arm_gemm/gemm_qint8.cpp b/src/core/NEON/kernels/arm_gemm/gemm_qint8.cpp
index 7c860a2..d1c4e49 100644
--- a/src/core/NEON/kernels/arm_gemm/gemm_qint8.cpp
+++ b/src/core/NEON/kernels/arm_gemm/gemm_qint8.cpp
@@ -71,7 +71,7 @@
 #ifdef ARM_COMPUTE_ENABLE_SVE
 #ifdef ARM_COMPUTE_ENABLE_SME2
 {
-    GemmMethod::GEMM_HYBRID,
+    GemmMethod::GEMV_PRETRANSPOSED,
     "sme2_gemv_s8qa_dot_16VL",
     [](const GemmArgs &args, const Requantize32 &qp) { return args._ci->has_sme2() && quant_hybrid_asymmetric(qp) && args._Msize == 1 && !args._indirect_input && args._nbatches == 1;  },
     nullptr,
diff --git a/src/core/NEON/kernels/arm_gemm/gemm_quint8.cpp b/src/core/NEON/kernels/arm_gemm/gemm_quint8.cpp
index 3baf985..b85b1c4 100644
--- a/src/core/NEON/kernels/arm_gemm/gemm_quint8.cpp
+++ b/src/core/NEON/kernels/arm_gemm/gemm_quint8.cpp
@@ -67,7 +67,7 @@
 #ifdef ARM_COMPUTE_ENABLE_SME2
 // SME kernels
 {
-    GemmMethod::GEMM_HYBRID,
+    GemmMethod::GEMV_PRETRANSPOSED,
     "sme2_gemv_u8qa_dot_16VL",
     [](const GemmArgs &args, const Requantize32 &qp) { return args._ci->has_sme2() && quant_hybrid_asymmetric(qp) && args._Msize == 1 && !args._indirect_input && args._nbatches == 1;  },
     nullptr,
diff --git a/src/core/NEON/kernels/arm_gemm/gemv_pretransposed.hpp b/src/core/NEON/kernels/arm_gemm/gemv_pretransposed.hpp
index f70fc98..92c884c 100644
--- a/src/core/NEON/kernels/arm_gemm/gemv_pretransposed.hpp
+++ b/src/core/NEON/kernels/arm_gemm/gemv_pretransposed.hpp
@@ -215,6 +215,15 @@
         }
     }
 
+    void set_quantized_bias(const int32_t *bias, size_t bias_multi_stride) override {
+        if (std::is_same<OutputStage, Requantize32>::value) {
+            Requantize32 *qp = reinterpret_cast<Requantize32 *>(&_os);
+
+            qp->bias = bias;
+            qp->bias_multi_stride = bias_multi_stride;
+        }
+    }
+
     void pretranspose_B_array(void *buffer, const To *B, const int ldb, const int B_multi_stride, bool transposed) override {
         assert(!transposed);