Add output operator for dynamic fusion

* The output of the fused operator must be explicitly specified
  using GpuOutput operator.
* Any temporary tensors used to connect the output of an operator
  to the input of another operator will be marked as no-alloc
  and won't be allocated as a tensor in the memory.

Resolves: COMPMID-5771
Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Change-Id: I5ae8e800f8f737db23a055a92b01c4f1d78c3bb8
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8794
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: SiCong Li <sicong.li@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Reviewed-by: Gunes Bayir <gunes.bayir@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/Android.bp b/Android.bp
index f4d94f9..963dd84 100644
--- a/Android.bp
+++ b/Android.bp
@@ -610,6 +610,7 @@
         "src/dynamic_fusion/sketch/gpu/operators/GpuClamp.cpp",
         "src/dynamic_fusion/sketch/gpu/operators/GpuConv2d.cpp",
         "src/dynamic_fusion/sketch/gpu/operators/GpuDepthwiseConv2d.cpp",
+        "src/dynamic_fusion/sketch/gpu/operators/GpuOutput.cpp",
         "src/dynamic_fusion/sketch/gpu/operators/internal/GpuElementwiseBinaryCommon.cpp",
         "src/dynamic_fusion/sketch/gpu/template_writer/GpuKernelVariableTable.cpp",
         "src/dynamic_fusion/sketch/gpu/template_writer/cl/ClTemplateActivation.cpp",