COMPMID-1687: Optimize CLGEMMMatrixMultiplyKernel for Mali-G76 - Part1

The current implementation is limited just to FP32

Change-Id: I185ab57e483e879d7c301e9cc3033efc8b41e244
Reviewed-on: https://review.mlplatform.org/389
Reviewed-by: Anthony Barbier <Anthony.barbier@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
14 files changed