Extend CKW MatMul with nt_t

- Add the kernel variant: (nt_t) to GpuCKWMatMul.
- Extend CKW MatMul validation test with nt_t.
- Fixes a bug in CKW where z-dim = 1.

Resolves: COMPMID-6435

Signed-off-by: Adnan AlSinan <adnan.alsinan@arm.com>
Change-Id: I4c5e8791e55f21ffff3c11eca7802c51a4259977
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10525
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/docs/user_guide/release_version_and_change_log.dox b/docs/user_guide/release_version_and_change_log.dox
index 2b8f5d8..9e67d7e 100644
--- a/docs/user_guide/release_version_and_change_log.dox
+++ b/docs/user_guide/release_version_and_change_log.dox
@@ -49,6 +49,7 @@
      - @ref experimental::dynamic_fusion::GpuCkwResize
      - @ref experimental::dynamic_fusion::GpuCkwPool2d
      - @ref experimental::dynamic_fusion::GpuCkwDepthwiseConv2d
+     - @ref experimental::dynamic_fusion::GpuCkwMatMul
    - Add support for OpenCL™ comand buffer with mutable dispatch extension.
  - Update OpenCL™ API headers to v2023.04.17.
  - Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface.