COMPMID-2588: Optimize the output detection kernel required by MobileNet-SSD (~27% improvement)

Change-Id: Ic6ce570af3878a0666ec680e0efabba3fcfd1222
Signed-off-by: Giuseppe Rossini <giuseppe.rossini@arm.com>
Reviewed-on: https://review.mlplatform.org/c/2160
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
diff --git a/arm_compute/runtime/NEON/NEFunctions.h b/arm_compute/runtime/NEON/NEFunctions.h
index 28fd7f3..95369ce 100644
--- a/arm_compute/runtime/NEON/NEFunctions.h
+++ b/arm_compute/runtime/NEON/NEFunctions.h
@@ -59,6 +59,7 @@
 #include "arm_compute/runtime/NEON/functions/NEDepthwiseConvolutionLayer.h"
 #include "arm_compute/runtime/NEON/functions/NEDequantizationLayer.h"
 #include "arm_compute/runtime/NEON/functions/NEDerivative.h"
+#include "arm_compute/runtime/NEON/functions/NEDetectionPostProcessLayer.h"
 #include "arm_compute/runtime/NEON/functions/NEDilate.h"
 #include "arm_compute/runtime/NEON/functions/NEDirectConvolutionLayer.h"
 #include "arm_compute/runtime/NEON/functions/NEElementwiseOperations.h"