COMPMID-3818: Extend documentation to report what fast-math flag enables

Signed-off-by: Sheri Zhang <sheri.zhang@arm.com>
Change-Id: I726decc11f19bd89187cdaec89d56dcf4613dff7
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/4112
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 39739cb..742a246 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -59,7 +59,18 @@
 - NCHW: Legacy layout where width is in the fastest changing dimension
 where N = batches, C = channels, H = height, W = width
 
-@section S4_1_3 Thread-safety
+@section S4_1_3 Fast-math support
+
+Compute Library supports different types of convolution methods, fast-math flag is only used for the Winograd algorithm.
+When the fast-math flag is enabled, both NEON and CL convolution layers will try to dispatch the fastest implementation available, which may introduce a drop in accuracy as well. The different scenarios involving the fast-math flag are presented below:
+- For FP32:
+    - no-fast-math: Only supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7
+    - fast-math: Supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7,5x5,7x7
+- For fp16:
+    - no-fast-math: No Winograd support
+    - fast-math: Supports Winograd 3x3,3x1,1x3,5x1,1x5,7x1,1x7,5x5,7x7
+
+@section S4_1_4 Thread-safety
 
 Although the library supports multi-threading during workload dispatch, thus parallelizing the execution of the workload at multiple threads, the current runtime module implementation is not thread-safe in the sense of executing different functions from separate threads.
 This lies to the fact that the provided scheduling mechanism wasn't designed with thread-safety in mind.