Fix invalid memory access for dynamically fused Cl Elementwise kernels

The M0 and N0 were incorrectly set for the case of broadcasting when the
elementwise component is non-root.

This is because we previously always use rhs tensor to derive the load
M0, N0. But for non-root components, the addend/divisor tensor can be
in the lhs or rhs. Thus this would fail in case the addend/divisor is in
the lhs.

- Also fixes broken Dynamic Fusion test

Resolves COMPMID-5482

Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I37f27ffa392781387db15739b1666f1dad28c554
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/VisualCompute/ComputeLibrary/+/445890
Tested-by: bsgcomp <bsgcomp@arm.com>
Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com>
Comments-Addressed: bsgcomp <bsgcomp@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8111
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/tests/validation/CL/UNIT/dynamic_fusion/ClCompositeKernel.cpp b/tests/validation/CL/UNIT/dynamic_fusion/ClCompositeKernel.cpp
index 3ffbc07..dc98d72 100644
--- a/tests/validation/CL/UNIT/dynamic_fusion/ClCompositeKernel.cpp
+++ b/tests/validation/CL/UNIT/dynamic_fusion/ClCompositeKernel.cpp
@@ -171,7 +171,7 @@
     SimpleTensor<float> ref_src_nhwc{ src_shape, data_type, 1, QuantizationInfo(), DataLayout::NHWC };
     SimpleTensor<float> ref_wei_nhwc{ wei_shape, data_type, 1, QuantizationInfo(), DataLayout::NHWC };
     SimpleTensor<float> ref_bia_nhwc{ bia_shape, data_type, 1, QuantizationInfo(), DataLayout::NHWC };
-    SimpleTensor<float> ref_addend_nhwc{ dst_shape, data_type, 1, QuantizationInfo(), DataLayout::NHWC };
+    SimpleTensor<float> ref_addend_nhwc{ addend_shape, data_type, 1, QuantizationInfo(), DataLayout::NHWC };
 
     // Fill reference
     fill<float>(ref_src_nhwc, 0, library.get());