COMPMID-896: Replace legacy 4x4 u8 GEMM kernel with safe version.

It's not safe to accumulate two u8xu8 results into a u16 accumulator.
This changes the kernel to use uadalp after every single multiply.
Correct the test fixture as well.

Change-Id: I011b90033c4673e55b843d079e3f7d185b1df330
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119096
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
2 files changed