MLBEDSW-2809: Redo the Tensor addressing

Added a static class TensorAddressMap that stores all Tensor addresses
based on their equivalence_id. Made the "address" field into a property
which getter and setter looks up/sets the tensor's address in
TensorAddressMap.

This makes the references to cpu_tensor/npu_tensor obsolete and they
have been removed.

Addition to scheduler: avoid SRAM spilling if an op has consumers in
other subgraphs.

Minor rework in LUTState; it will now assign a unique equivalence_id to
the SHRAM lut tensor to avoid issues with addressing. The equivalent
checks in LUTState now compares the values of the LUT instead of the the
equivalence_id.

Updated LUT unit tests accordingly.

Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
Change-Id: I41de5a8a4e5f07b77d6544d8d4034b754993e503
diff --git a/ethosu/vela/lut.py b/ethosu/vela/lut.py
index 0e8dcc9..e3373ca 100644
--- a/ethosu/vela/lut.py
+++ b/ethosu/vela/lut.py
@@ -42,9 +42,9 @@
         self.tensors = []
 
     def get_equivalent(self, lut_tens):
-        # Returns existing lut with same equivalence id, None if not found
+        # Returns existing lut with the same values, None if not found
         for t in self.tensors:
-            if t.equivalent(lut_tens):
+            if np.array_equal(t.values, lut_tens.values):
                 return t
         return None
 
@@ -60,6 +60,7 @@
             end2 = start2 + tens.storage_size()
             if not numeric_util.overlaps(start, end, start2, end2):
                 new_state.tensors.append(tens)
+
         return new_state
 
     def find_best_address(self, start, stop, step):
@@ -129,6 +130,7 @@
         # Place the LUT in the last 2 blocks of SHRAM
         # Alignment is always on the size of the LUT, 256 for 256-byte LUT, 1K for 1K LUT, etc
         address = lut_state.find_best_address(lut_start, lut_end, lut_tens.storage_size())
+        lut_tens.equivalence_id = uuid.uuid4()
         lut_tens.address = address
         cmd.ps.primary_op.attrs["lut_index"] = (address - lut_start) // slot_size
         lut_state = lut_state.put(lut_tens)