COMPMID-896: Replace legacy 4x4 u8 GEMM kernel with safe version.

It's not safe to accumulate two u8xu8 results into a u16 accumulator.
This changes the kernel to use uadalp after every single multiply.
Correct the test fixture as well.

Change-Id: I011b90033c4673e55b843d079e3f7d185b1df330
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/119096
Tested-by: Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
diff --git a/tests/validation/fixtures/GEMMLowpAssemblyFixture.h b/tests/validation/fixtures/GEMMLowpAssemblyFixture.h
index ff33c9d..d6b94a1 100644
--- a/tests/validation/fixtures/GEMMLowpAssemblyFixture.h
+++ b/tests/validation/fixtures/GEMMLowpAssemblyFixture.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2017 ARM Limited.
+ * Copyright (c) 2017-2018 ARM Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -98,8 +98,8 @@
         }
         else
         {
-            fill(AccessorType(a), 0, 0, 128);
-            fill(AccessorType(b), 1, 0, 128);
+            fill(AccessorType(a), 0, 0, 255);
+            fill(AccessorType(b), 1, 0, 255);
         }
         fill(AccessorType(c), 2, 0, 0);
 
@@ -124,8 +124,8 @@
         }
         else
         {
-            fill(a, 0, 0, 128);
-            fill(b, 1, 0, 128);
+            fill(a, 0, 0, 255);
+            fill(b, 1, 0, 255);
         }
 
         return reference::gemmlowp<int32_t, T2>(a, b);