MLBEDSW-7393: MLCE: Optimize compile time for large networks

- There is a problem with large networks containing many NPU
subgraphs. The scheduling takes too long time since the snapshot
memory calculation is always doing a complete update for the
full graph.
- A complete run is needed in the end to calculate all the
time indexes correctly. However, when scheduling a NPU subgraph
it is enough to extract live ranges for the current schedule
and its operators.

Change-Id: Iccb7d6728119c1428ad0b45a2ac34e92158c15bd
Signed-off-by: Johan Alfven <johan.alfven@arm.com>
diff --git a/ethosu/vela/live_range.py b/ethosu/vela/live_range.py
index 6a2a04a..05e481e 100644
--- a/ethosu/vela/live_range.py
+++ b/ethosu/vela/live_range.py
@@ -251,7 +251,7 @@
                 # If the primary-op is an NpuOp that means this is where an Npu subgraph
                 # is called. Go into said subgraph and extract live ranges before continuing.
                 # Use default allocation alignment of 16 for Npu tensors
-                lr_graph = _extract_live_ranges_from_schedule(
+                lr_graph = extract_live_ranges_from_schedule(
                     op_subgraph, target_mem_area, target_mem_type_set, lr_graph
                 )
             else:
@@ -316,7 +316,7 @@
     return lr_graph
 
 
-def _extract_live_ranges_from_schedule(sg, target_mem_area, target_mem_type_set, lr_graph):
+def extract_live_ranges_from_schedule(sg, target_mem_area, target_mem_type_set, lr_graph):
     time_for_cascade = {}
     for sched_op in sg.sched_ops:
         op_info = sg.schedule.cost_map[sched_op]