COMPMID-1266 : Add support for FP16 in CLWinogradConvolutionLayer: 5x5 kernels

Introduced F32 accumulation for F16 winograd gemm and output transform
WinogradConvolution will be available for F16 only if fast math flag is enabled

Change-Id: I215593c205236a0f9669218437bb40b184ec6a4f
13 files changed