Add multiple output support for dynamic fusion

* The dependency graph now can schedule any acyclic graph into
  a sequential list of operators. This is needed as the output
  operators now form branches in the graph.
* Fix the definition of input, output and intermediate tensors
  in GpuKernelComponentGroup to support non-linear but sequential
  list of operators.
* Add constraint on GpuOperatorGroup to enforce strictly linear
  fusion style, but allow output operator as the only form of
  branch.

Resolves: COMPMID-5771
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I68de3a31a2456145081f0a397e4e61dd66327682
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8823
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/dynamic_fusion/sketch/gpu/template_writer/cl/ClTemplateActivation.cpp b/src/dynamic_fusion/sketch/gpu/template_writer/cl/ClTemplateActivation.cpp
index c3128ea..8adf056 100644
--- a/src/dynamic_fusion/sketch/gpu/template_writer/cl/ClTemplateActivation.cpp
+++ b/src/dynamic_fusion/sketch/gpu/template_writer/cl/ClTemplateActivation.cpp
@@ -125,7 +125,7 @@
     lut["src"] = vtable.get_variable(_src);
     lut["dst"] = vtable.get_variable(_dst);
 
-    const auto dst_argument = vtable.get_variable(comp_group.get_dst_tensors()[0]);
+    const auto dst_argument = vtable.get_variable(comp_group.get_any_dst_tensor());
     lut["arg_dst"]          = dst_argument.uniq_name;
 
     // Local build options