Fuse batch normalization changes to enable fp16 in armv8a multi_isa builds

        * Code guarded with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC needs
          to be moved to an fp16.cpp file to allow compilation with
          -march=armv8.2-a+fp16

        * fp16.cpp needs to use the template fused_batch_normalization_dwc_nhwc() that
          had to be moved from impl.cpp to impl.h

        * Removed impl.cpp

        * Partially resolves MLCE-1102

Change-Id: Idaaa113c71729e32e565acf5fb5694c76c36d76d
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10308
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/filelist.json b/filelist.json
index d3cf984..2a88aec 100644
--- a/filelist.json
+++ b/filelist.json
@@ -973,8 +973,7 @@
           ],
           "neon": {
             "common": [
-              "src/cpu/kernels/fuse_batch_normalization/nchw/all.cpp",
-              "src/cpu/kernels/fuse_batch_normalization/nhwc/neon/impl.cpp"
+              "src/cpu/kernels/fuse_batch_normalization/nchw/all.cpp"
             ],
             "fp16": [
               "src/cpu/kernels/fuse_batch_normalization/generic/fp16.cpp",