Handle Conv2d layer with implicit output padding in NHWC

Corner cases exist when output top/bottom padding is non-zero for
Convolution Layer. This can cause invalid output from the
NEGEMMConvolutionLayer as assembly kernel integration does not
efficiently handles such cases.

As a workaround we always allocate a memory-managed auxiliary tensor
which we use as an output for GEMM when padding exists and then we copy
to the padded output. If no padding exists we import the output tensor
memory to the temporary buffer and perform calculation as we did before.

Resolves: COMPMID-4114

Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Change-Id: If82d0e115b8369b91d775895d5315b044306cc74
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/5083
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Reviewed-by: Giorgio Arena <giorgio.arena@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/src/runtime/NEON/functions/NEGEMMAssemblyDispatch.h b/src/runtime/NEON/functions/NEGEMMAssemblyDispatch.h
index 466e601..381fa4d 100644
--- a/src/runtime/NEON/functions/NEGEMMAssemblyDispatch.h
+++ b/src/runtime/NEON/functions/NEGEMMAssemblyDispatch.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2018-2020 Arm Limited.
+ * Copyright (c) 2018-2021 Arm Limited.
  *
  * SPDX-License-Identifier: MIT
  *
@@ -117,7 +117,7 @@
     void run() override;
 
 private:
-    std::unique_ptr<IFallback> _arm_gemm;        /** Interface for the arm_gemm fallback */
+    std::unique_ptr<IFallback> _arm_gemm;        /**< Interface for the arm_gemm fallback */
     MemoryGroup                _memory_group;    /**< Function memory group */
     IWeightsManager           *_weights_manager; /**< Pointer to the weights manager */
 };