COMPMID-706 - Add GEMMLowp output stage for scaling by a fixed point number

DoD:
- Implement NEON kernel for quantizing down the gemmlowp result. The
  result should be scaled by a fixedpoint number
- Implement OpenCL kernel for quantizing down the gemmlowp result. The
  result should be scaled by a fixedpoint number
- Add test for validating the result

Required for:
- Integration of GEMMLowp in Android NN
- Convolution quantized
- Fully connected quantized

Change-Id: Ia963d25d695471e963961fb49a5600e78374ac4f
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/110981
Tested-by: BSG Visual Compute Jenkins server to access repositories on http://mpd-gerrit.cambridge.arm.com <bsgcomp@arm.com>
Reviewed-by: Georgios Pinitas <georgios.pinitas@arm.com>
Reviewed-by: Anthony Barbier <anthony.barbier@arm.com>
diff --git a/arm_compute/runtime/CL/functions/CLFullyConnectedLayer.h b/arm_compute/runtime/CL/functions/CLFullyConnectedLayer.h
index 26f23ce..2cac06c 100644
--- a/arm_compute/runtime/CL/functions/CLFullyConnectedLayer.h
+++ b/arm_compute/runtime/CL/functions/CLFullyConnectedLayer.h
@@ -87,20 +87,20 @@
     void configure_conv_fc(const ICLTensor *input, const ICLTensor *weights, ICLTensor *output);
     void configure_mm(const ICLTensor *input, const ICLTensor *weights, ICLTensor *output, bool is_interleaved_transposed = true);
 
-    CLMemoryGroup                           _memory_group;
-    CLIm2ColKernel                          _im2col_kernel;
-    CLFullyConnectedLayerReshapeWeights     _reshape_weights_kernel;
-    CLGEMMMatrixMultiplyKernel              _mm_kernel;
-    CLGEMMLowpMatrixMultiplyCore            _mm_gemmlowp;
-    CLGEMMLowpQuantizeDownInt32ToUint8Scale _gemmlowp_output_stage;
-    CLGEMMMatrixAccumulateBiasesKernel      _accumulate_biases_kernel;
-    CLTensor                                _im2col_output;
-    CLTensor                                _gemmlowp_output;
-    CLTensor                                _reshape_weights_output;
-    bool                                    _are_weights_reshaped;
-    bool                                    _is_fc_after_conv;
-    bool                                    _accumulate_biases;
-    bool                                    _is_quantized;
+    CLMemoryGroup                                       _memory_group;
+    CLIm2ColKernel                                      _im2col_kernel;
+    CLFullyConnectedLayerReshapeWeights                 _reshape_weights_kernel;
+    CLGEMMMatrixMultiplyKernel                          _mm_kernel;
+    CLGEMMLowpMatrixMultiplyCore                        _mm_gemmlowp;
+    CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint _gemmlowp_output_stage;
+    CLGEMMMatrixAccumulateBiasesKernel                  _accumulate_biases_kernel;
+    CLTensor                                            _im2col_output;
+    CLTensor                                            _gemmlowp_output;
+    CLTensor                                            _reshape_weights_output;
+    bool                                                _are_weights_reshaped;
+    bool                                                _is_fc_after_conv;
+    bool                                                _accumulate_biases;
+    bool                                                _is_quantized;
 };
 }
 #endif /* __ARM_COMPUTE_CLFULLYCONNECTEDLAYER_H__ */