MLBEDSW-7648: Fix bug with filter padding in conv2d

* Fix bug that caused filter padding to not be added proportionally
  compared to the hardware padding added to IFM.
* Update needed_total_padding function that calculates hardware padding
  to also account for the cases in which IFM width is not divisible by
  the stride width.
* Update supported ops constraint on strides for conv2d to mark ops with
  stride width > 3 and IFM width that is not divisible by the
  optimization resize factor as not supported.
* Update unit tests that verify correct functionality when checking
  whether ops are supported or not.

Change-Id: I62f14cca890b779ca787a9603fa37c873ad522f8
Signed-off-by: Raul Farkas <raul.farkas@arm.com>
diff --git a/ethosu/vela/graph_optimiser_util.py b/ethosu/vela/graph_optimiser_util.py
index da3fe13..220ba1a 100644
--- a/ethosu/vela/graph_optimiser_util.py
+++ b/ethosu/vela/graph_optimiser_util.py
@@ -185,10 +185,11 @@
 
 
 def needed_total_padding(input_size, stride, filter_size):
-    out_size = (input_size + stride - 1) // stride
-    needed_input = (out_size - 1) * stride + filter_size
-    total_padding = max(0, needed_input - input_size)
-    return total_padding
+    """Compute hardware padding."""
+    if input_size % stride == 0:
+        return max(filter_size - stride, 0)
+
+    return max(filter_size - (input_size % stride), 0)
 
 
 # Set input/output tensor equivalence to the same id for memory operations