Implement FP32/FP16 MatMul NT/NT kernel using the MMUL extension

Resolves COMPMID-6194

Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: Ie45e2aa9533948b2e5235563cef1d3834494eccf
Signed-off-by: SiCong Li <sicong.li@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9739
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/filelist.json b/filelist.json
index 6c5b78f..f354e69 100644
--- a/filelist.json
+++ b/filelist.json
@@ -515,6 +515,7 @@
         "common": [
           "src/gpu/cl/kernels/ClMatMulLowpNativeKernel.cpp",
           "src/gpu/cl/kernels/ClMatMulNativeKernel.cpp",
+          "src/gpu/cl/kernels/ClMatMulNativeMMULKernel.cpp",
           "src/gpu/cl/operators/ClMatMul.cpp",
           "src/runtime/CL/functions/CLMatMul.cpp",
           "src/runtime/heuristics/matmul_native/ClMatMulNativeDefaultConfigValhall.cpp",