MLBEDSW-8749: MLCE: Output diff on strided slice

 - When possible, a read slice from a split or stride is moved to
the following op. The problem in this case was that the following
op was an elementwise op where the ifm needed to be broadcasted
and that is not supported.
 - The result is a faulty elementwise op with an output diff.
 - The fix is to prevent moving the slice read to the elementwise op
if broadcasting is needed.

Change-Id: I89928c217510a822f91f051fd1ad6e34040c19de
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
diff --git a/ethosu/vela/tflite_graph_optimiser.py b/ethosu/vela/tflite_graph_optimiser.py
index ad979bd..1e53e37 100644
--- a/ethosu/vela/tflite_graph_optimiser.py
+++ b/ethosu/vela/tflite_graph_optimiser.py
@@ -1,4 +1,4 @@
-# SPDX-FileCopyrightText: Copyright 2020-2023 Arm Limited and/or its affiliates <open-source-office@arm.com>
+# SPDX-FileCopyrightText: Copyright 2020-2024 Arm Limited and/or its affiliates <open-source-office@arm.com>
 #
 # SPDX-License-Identifier: Apache-2.0
 #
@@ -191,12 +191,17 @@
     if op.type == Op.SplitSliceRead:
         # Check if it is possible to put the SplitSliceRead on the tensor consumer(s),
         # or if an avgpool need to be inserted
-        # Not possible to do if consumer is a Transpose op since ifm shape has been reshaped and can not be changed
+        # Not possible to move:
+        #   - if consumer is a Transpose op since ifm shape has been reshaped and can not be changed
+        #   - if consumer is elementwise and ifm needs to be broadcasted
         if op.ofm_shapes[0] == Shape4D.from_list(op.ofm.shape) and all(
             consumer is not None
             and consumer.run_on_npu
             and consumer.type not in memory_only_ops
             and consumer.original_type != Op.Transpose
+            and not (
+                consumer.type.is_binary_elementwise_op() and Shape4D.from_list(consumer.ofm.shape) != op.ofm_shapes[0]
+            )
             for consumer in op.ofm.consumer_list
         ):
             # SplitSliceRead can be performed by tensor consumer(s)