Implement MatMul Function and Operator with Floating Point support for CPU

- Implements MatMul function and operator for floating point datatype FP16/FP32
- Includes support for transposing dynamic tensors prior to matrix multiplication.
- Adds tests for 2D/3D/4D+ tensors in MatMul with F32/F16 datatype (with all combinations of transposed/not-transposed tensors)
- Updates fixture to allow for testing fused activation in MatMul
- Adds tests for matmul with and without fused activation

Resolved: [COMPMID-5898]
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>
Change-Id: Iefa84b26dd723c9a51e6c3f91023152c6c31ace2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/9411
Reviewed-by: SiCong Li <sicong.li@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/filelist.json b/filelist.json
index 01659f5..cf1c63b 100644
--- a/filelist.json
+++ b/filelist.json
@@ -1534,9 +1534,11 @@
             "src/cpu/kernels/CpuGemmLowpOffsetContributionOutputStageKernel.cpp",
             "src/cpu/kernels/CpuGemmLowpOffsetContributionKernel.cpp",
             "src/cpu/operators/CpuGemm.cpp",
+            "src/cpu/operators/CpuMatMul.cpp",
             "src/cpu/operators/CpuGemmLowpOutputStage.cpp",
             "src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp",
             "src/runtime/NEON/functions/NEGEMM.cpp",
+            "src/runtime/NEON/functions/NEMatMul.cpp",
             "src/runtime/NEON/functions/NEGEMMLowpMatrixMultiplyCore.cpp",
             "src/runtime/NEON/functions/NEGEMMLowpOutputStage.cpp"
           ],
@@ -1856,6 +1858,14 @@
         }
         }
       },
+      "MatMul" : {
+        "files": {
+          "common": [
+            "src/cpu/operators/CpuMatMul.cpp",
+            "src/runtime/NEON/functions/NEMatMul.cpp"
+          ]
+        }
+      },
       "Mul": {
         "files": {
           "common": [