COMPMID-2452: Fix 32-bit per-channel convolution for NEON.

Rearrange the kernels in run to ensure type conversion takes place
before the matrix transformations.

Change-Id: Ibf47788fe71a84fd7549f8667549552e15ca8aab
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2251
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/core/NEON/kernels/NEGEMMLowpMatrixMultiplyKernel.cpp b/src/core/NEON/kernels/NEGEMMLowpMatrixMultiplyKernel.cpp
index 8f5a208..3082ff2 100644
--- a/src/core/NEON/kernels/NEGEMMLowpMatrixMultiplyKernel.cpp
+++ b/src/core/NEON/kernels/NEGEMMLowpMatrixMultiplyKernel.cpp
@@ -870,6 +870,7 @@
         switch(_input0->info()->data_type())
         {
             case DataType::S8:
+            case DataType::QASYMM8_SIGNED:
             {
                 vector_matrix_multiply_s8(ina, inb, out, width_matrix_a, width_matrix_b, in_b_stride, window);
                 break;