MLBEDSW-2570 Avoid usage of NHCWB16 for some cases

Avoid usage of NHCWB16 when Stack/Pack/Concat is performed in axis 3,
and the "concat start" of each slice to be combined is not a multiple
of 16.

Signed-off-by: Patrik Gustavsson <patrik.gustavsson@arm.com>
Change-Id: If3f7b4a3424be3c86fc2dc48e8649ce4c4f49485
diff --git a/ethosu/vela/graph_optimiser.py b/ethosu/vela/graph_optimiser.py
index 582924c..3fe703e 100644
--- a/ethosu/vela/graph_optimiser.py
+++ b/ethosu/vela/graph_optimiser.py
@@ -69,6 +69,16 @@
             tens.ops.append(new_op)
         assert tens.shape[axis] == offset
 
+        # If axis = 3, NHCWB16 can only be used in the output if all the concat_start's are a multiple of 16,
+        # as it is only then the address offset for the ofm, for all operations, will be 16 byte aligned
+        # For other values of axis the address offsets will be 16 byte aligned, as they are all based on c = 0
+        # and those addresses are always 16 byte aligned due to the NHCWB16 format.
+        if axis == 3:
+            for op in tens.ops:
+                if op.attrs["concat_start"] % 16 != 0:
+                    tens.avoid_NHCWB16 = True
+                    break
+
     return tens