Port the ClGemmLowp kernels to the new API

Ported kernels:
 - CLGEMMLowpMatrixMultiplyNativeKernel
 - CLGEMMLowpMatrixMultiplyReshapedKernel
 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
 - CLGEMMLowpOffsetContributionKernel
 - CLGEMMLowpOffsetContributionOutputStageKernel
 - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
 - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
 - CLGEMMLowpQuantizeDownInt32ScaleKernel

Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: I9d5a744d6a2dd2f2726fdfb291bad000b6970de2
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5870
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
index eb4c280..0c8b57f 100644
--- a/docs/user_guide/release_version_and_change_log.dox
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -227,7 +227,7 @@
       - @ref CLLogSoftmaxLayer
       - GCSoftmaxLayer
  - New OpenCL kernels / functions:
-   - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+   - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
    - @ref CLLogicalNot
    - @ref CLLogicalAnd
    - @ref CLLogicalOr
@@ -260,13 +260,13 @@
    - @ref CLBatchNormalizationLayerKernel
    - CLPoolingLayerKernel
    - CLWinogradInputTransformKernel
-   - @ref CLGEMMLowpMatrixMultiplyNativeKernel
-   - @ref CLGEMMLowpMatrixAReductionKernel
-   - @ref CLGEMMLowpMatrixBReductionKernel
-   - @ref CLGEMMLowpOffsetContributionOutputStageKernel
-   - @ref CLGEMMLowpOffsetContributionKernel
+   - CLGEMMLowpMatrixMultiplyNativeKernel
+   - CLGEMMLowpMatrixAReductionKernel
+   - CLGEMMLowpMatrixBReductionKernel
+   - CLGEMMLowpOffsetContributionOutputStageKernel
+   - CLGEMMLowpOffsetContributionKernel
    - CLWinogradOutputTransformKernel
-   - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+   - CLGEMMLowpMatrixMultiplyReshapedKernel
    - @ref CLFuseBatchNormalizationKernel
    - @ref CLDepthwiseConvolutionLayerNativeKernel
    - CLDepthConvertLayerKernel
@@ -281,11 +281,11 @@
    - CLLogits1DNormKernel
    - CLHeightConcatenateLayerKernel
    - CLGEMMMatrixMultiplyKernel
-   - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
-   - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
-   - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+   - CLGEMMLowpQuantizeDownInt32ScaleKernel
+   - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+   - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
    - CLDepthConcatenateLayerKernel
-   - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
+   - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
  - Removed OpenCL kernels / functions:
    - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
    - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
@@ -596,9 +596,9 @@
      - @ref CLDeconvolutionLayer
      - @ref CLDirectDeconvolutionLayer
      - @ref CLGEMMDeconvolutionLayer
-     - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
-     - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel
-     - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
+     - CLGEMMLowpMatrixMultiplyReshapedKernel
+     - CLGEMMLowpQuantizeDownInt32ScaleKernel
+     - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
      - @ref CLReductionOperation
      - @ref CLReduceMean
      - @ref NEScale
@@ -655,9 +655,9 @@
      - @ref CLDepthwiseConvolutionLayer
      - CLDepthwiseConvolutionLayer3x3
      - @ref CLGEMMConvolutionLayer
-     - @ref CLGEMMLowpMatrixMultiplyCore
-     - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
-     - @ref CLGEMMLowpMatrixMultiplyNativeKernel
+     - CLGEMMLowpMatrixMultiplyCore
+     - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+     - CLGEMMLowpMatrixMultiplyNativeKernel
      - @ref NEActivationLayer
      - NEComparisonOperationKernel
      - @ref NEConvolutionLayer
@@ -680,7 +680,7 @@
      - @ref NESplit
  - New OpenCL kernels / functions:
      - @ref CLFill
-     - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
+     - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
  - New Arm® Neon™ kernels / functions:
      - @ref NEFill
      - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
@@ -861,7 +861,7 @@
     - @ref CLFFTDigitReverseKernel
     - @ref CLFFTRadixStageKernel
     - @ref CLFFTScaleKernel
-    - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
+    - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
     - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
     - CLHeightConcatenateLayerKernel
     - @ref CLDirectDeconvolutionLayer
@@ -953,7 +953,7 @@
     - @ref CLRangeKernel / @ref CLRange
     - @ref CLUnstack
     - @ref CLGatherKernel / @ref CLGather
-    - @ref CLGEMMLowpMatrixMultiplyReshapedKernel
+    - CLGEMMLowpMatrixMultiplyReshapedKernel
  - New CPP kernels / functions:
     - @ref CPPDetectionOutputLayer
     - @ref CPPTopKV / @ref CPPTopKVKernel
@@ -1247,8 +1247,8 @@
     - NEWinogradLayer / NEWinogradLayerKernel
 
  - New OpenCL kernels / functions
-    - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore
-    - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
+    - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
+    - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
 
  - New graph nodes for Arm® Neon™ and OpenCL
     - graph::BranchLayer