INT8 Quantized MeanStdDevNorm (LayerNorm)

Implements LayerNorm for qasymm8 tensors.
Uses uint8x16 loads and stores.
Summation is performed in integer arithmetic (vpaddl)
Normalization is performed in float32 before requantizing back to int8.

Signed-off-by: Murray Kornelsen <murray.kornelsen@mail.mcgill.ca>
Change-Id: I2407c8b34717fb47adab98791bd76fb8a3c62f4a
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/7922
Comments-Addressed: Pablo Marquez Tello <pablo.tello@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Pablo Marquez Tello <pablo.tello@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Benchmark: Arm Jenkins <bsgcomp@arm.com>
diff --git a/filelist.json b/filelist.json
index c218ed9..eb39915 100644
--- a/filelist.json
+++ b/filelist.json
@@ -1738,7 +1738,8 @@
         "neon":{
           "common":["src/cpu/kernels/meanstddevnorm/generic/neon/impl.cpp"],
           "fp32":["src/cpu/kernels/meanstddevnorm/generic/neon/fp32.cpp"],
-          "fp16":["src/cpu/kernels/meanstddevnorm/generic/neon/fp16.cpp"]
+          "fp16":["src/cpu/kernels/meanstddevnorm/generic/neon/fp16.cpp"],
+          "qasymm8":["src/cpu/kernels/meanstddevnorm/generic/neon/qasymm8.cpp"]
         }
         }
       },