MLBEDSW-6880: Add support for multiple subgraphs

- Vela failed to compile networks with multiple subgraphs because
only cascaded passes in the root subgraph were used when
extracting the live ranges. The fix is to extract the subgraph
range live on Ops that have connected subgraphs.

- The tf_writer did not handle multiple subgraphs in a correct way
resulting in corrupt buffer data in the optimized tflite file. The buffer
index must be unique for every tensor.

-Added support to handle multiple subgraphs for the OfflineMemoryAllocation
meta data. The change will not change behavior for single graphs.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2328dfc1f07e2e4faf43a75423ea95423096ffa3
diff --git a/ethosu/vela/compiler_driver.py b/ethosu/vela/compiler_driver.py
index cace0f0..61a3b0b 100644
--- a/ethosu/vela/compiler_driver.py
+++ b/ethosu/vela/compiler_driver.py
@@ -233,7 +233,10 @@
             sg, arch, scratch_tens, scratch_fast_tens, flash_tens
         )
 
-    npu_serialisation.rewrite_npu_call_ops(root_sg, arch)
+    # Create list of CPU subgraphs with same order as the list of all subgraphs
+    cpu_subgraphs = [sg for sg in nng.subgraphs if sg.placement == PassPlacement.Cpu]
+    for sg in cpu_subgraphs:
+        npu_serialisation.rewrite_npu_call_ops(sg, arch)
 
     # Set Scratch and Fast_scratch Tensor size
     if scratch_tens is not None: