MLECO-3244: Documentation on timing adapters

Documents current differences between timing adapters implementations on
Corstone-300 and Corstone-310 platforms.

Signed-off-by: Maksims Svecovs <maksims.svecovs@arm.com>
Change-Id: I3161dc929bd01217a4992be869f13377a58e5471
diff --git a/docs/sections/building.md b/docs/sections/building.md
index a7b64aa..f6b71a8 100644
--- a/docs/sections/building.md
+++ b/docs/sections/building.md
@@ -20,7 +20,7 @@
     - [Configuring the build for simple-platform](./building.md#configuring-the-build-for-simple_platform)
     - [Building with CMakePresets](./building.md#building-with-cmakepresets)
     - [Building the configured project](./building.md#building-the-configured-project)
-  - [Building timing adapter with custom options](./building.md#building-timing-adapter-with-custom-options)
+  - [Building timing adapter with custom options](./timing_adapters.md#building-timing-adapter-with-custom-options)
   - [Add custom inputs](./building.md#add-custom-inputs)
   - [Add custom model](./building.md#add-custom-model)
   - [Optimize custom model with Vela compiler](./building.md#optimize-custom-model-with-vela-compiler)
@@ -641,137 +641,6 @@
 
 > **Note:**  For the specific use-case commands, refer to the relative section in the use-case documentation.
 
-## Building timing adapter with custom options
-
-The sources also contain the configuration for a timing adapter utility for the *Ethos-U* NPU driver. The timing
-adapter allows the platform to simulate user provided memory bandwidth and latency constraints.
-
-The timing adapter driver aims to control the behavior of two AXI buses used by *Ethos-U* NPU. One is for SRAM memory
-region, and the other is for flash or DRAM.
-
-The SRAM is where intermediate buffers are expected to be allocated and therefore, this region can serve frequent Read
-and Write traffic generated by computation operations while executing a neural network inference.
-
-The flash or DDR is where we expect to store the model weights and therefore, this bus would only usually be used for RO
-traffic.
-
-It is used for MPS3 FPGA and for Fast Model environment.
-
-The CMake build framework allows the parameters to control the behavior of each bus with following parameters:
-
-- `MAXR`: Maximum number of pending read operations allowed. `0` is inferred as infinite and the default value is `4`.
-
-- `MAXW`: Maximum number of pending write operations allowed. `0` is inferred as infinite and the default value is `4`.
-
-- `MAXRW`: Maximum number of pending read and write operations allowed. `0` is inferred as infinite and the default
-  value is `8`.
-
-- `RLATENCY`: Minimum latency, in cycle counts, for a read operation. This is the duration between `ARVALID` and
-  `RVALID` signals. The default value is `50`.
-
-- `WLATENCY`: Minimum latency, in cycle counts, for a write operation. This is the duration between `WVALID` and
-  `WLAST`, with `BVALID` being deasserted. The default value is `50`.
-
-- `PULSE_ON`: The number of cycles where addresses are let through. The default value is `5100`.
-
-- `PULSE_OFF`: The number of cycles where addresses are blocked. The default value is `5100`.
-
-- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A pulse cycle is defined by `PULSE_ON`
-  and `PULSE_OFF`. `0` is inferred as infinite and the default value is `625`.
-
-  > **Note:** The bandwidth cap `BWCAP` operates on the transaction level and, because of its simple implementation,
-  > the accuracy is limited.
-  > When set to a small value it allows only a small number of transactions for each pulse cycle.
-  > Once the counter has reached or exceeded the configured cap, no transactions will be allowed before the next pulse
-  > cycle. In order to minimize this effect some possible solutions are:
-  >
-  > - scale up all the parameters to a reasonably large value.
-  > - scale up `BWCAP` as a multiple of the burst length (in this case bulk traffic will not face rounding errors in
-  >   the bandwidth cap).
-
-- `MODE`: Timing adapter operation mode. Default value is `0`.
-
-  - `Bit 0`: `0`=simple, `1`=latency-deadline QoS throttling of read versus write,
-
-  - `Bit 1`: `1`=enable random AR reordering (`0`=default),
-
-  - `Bit 2`: `1`=enable random R reordering (`0`=default),
-
-  - `Bit 3`: `1`=enable random B reordering (`0`=default)
-
-For the CMake build configuration of the timing adapter, the SRAM AXI is assigned `index 0` and the flash, or DRAM, AXI
-bus has `index 1`.
-
-To change the bus parameter for the build a "***TA_\<index>_*"** prefix should be added to the above. For example,
-**TA0_MAXR=10** sets the maximum pending reads to 10 on the SRAM AXI bus.
-
-As an example, if we have the following parameters for the flash, or DRAM, region:
-
-- `TA1_MAXR` = "2"
-
-- `TA1_MAXW` = "0"
-
-- `TA1_MAXRW` = "0"
-
-- `TA1_RLATENCY` = "64"
-
-- `TA1_WLATENCY` = "32"
-
-- `TA1_PULSE_ON` = "320"
-
-- `TA1_PULSE_OFF` = "80"
-
-- `TA1_BWCAP` = "50"
-
-For a clock rate of 500MHz, this would translate to:
-
-- The maximum duty cycle for any operation is:\
-  ![Maximum duty cycle formula](../media/F1.png)
-
-- Maximum bit rate for this bus (64-bit wide) is:\
-  ![Maximum bit rate formula](../media/F2.png)
-
-- With a read latency of 64 cycles, and maximum pending reads as 2, each read could be a maximum of 64 or 128 bytes. As
-  defined for the *Ethos-U* NPU AXI bus attribute.
-
-  The bandwidth is calculated solely by read parameters:
-
-  ![Bandwidth formula](../media/F3.png)
-
-  This is higher than the overall bandwidth dictated by the bus parameters of:
-
-  ![Overall bandwidth formula](../media/F4.png)
-
-This suggests that the read operation is only limited by the overall bus bandwidth.
-
-Timing adapter requires recompilation to change parameters. Default timing adapter configuration file pointed to by
-`TA_CONFIG_FILE` build parameter is located in the `scripts/cmake folder` and contains all options for `AXI0` and `AXI1`
-as previously described.
-
-here is an example of `scripts/cmake/timing_adapter/ta_config_u55_high_end.cmake`:
-
-```cmake
-# Timing adapter options
-set(TA_INTERACTIVE OFF)
-
-# Timing adapter settings for AXI0
-set(TA0_MAXR "8")
-set(TA0_MAXW "8")
-set(TA0_MAXRW "0")
-set(TA0_RLATENCY "32")
-set(TA0_WLATENCY "32")
-set(TA0_PULSE_ON "3999")
-set(TA0_PULSE_OFF "1")
-set(TA0_BWCAP "4000")
-...
-```
-
-An example of the build with a custom timing adapter configuration:
-
-```commandline
-cmake .. -DTA_CONFIG_FILE=scripts/cmake/timing_adapter/my_ta_config.cmake
-```
-
 ## Add custom inputs
 
 The application performs inference on input data found in the folder set by the CMake parameters, for more information
diff --git a/docs/sections/timing_adapters.md b/docs/sections/timing_adapters.md
new file mode 100644
index 0000000..ab05490
--- /dev/null
+++ b/docs/sections/timing_adapters.md
@@ -0,0 +1,153 @@
+# Building timing adapter with custom options
+
+The sources contain the configuration for a timing adapter utility for the *Arm® Ethos™-U* NPU driver. The timing
+adapter allows the platform to simulate user provided memory bandwidth and latency constraints.
+
+The timing adapter driver aims to control the behavior of two AXI buses used by *Ethos-U* NPU. One is for SRAM memory
+region, and the other is for flash or DRAM.
+
+The SRAM is where intermediate buffers are expected to be allocated and therefore, this region can serve frequent Read
+and Write traffic generated by computation operations while executing a neural network inference.
+
+The flash or DDR is where we expect to store the model weights and therefore, this bus would only usually be used for RO
+traffic.
+
+It is used for MPS3 FPGA and for Fast Model environment.
+
+The CMake build framework allows the parameters to control the behavior of each bus with following parameters:
+
+- `MAXR`: Maximum number of pending read operations allowed. `0` is inferred as infinite and the default value is `4`.
+
+- `MAXW`: Maximum number of pending write operations allowed. `0` is inferred as infinite and the default value is `4`.
+
+- `MAXRW`: Maximum number of pending read and write operations allowed. `0` is inferred as infinite and the default
+  value is `8`.
+
+- `RLATENCY`: Minimum latency, in cycle counts, for a read operation. This is the duration between `ARVALID` and
+  `RVALID` signals. The default value is `50`.
+
+- `WLATENCY`: Minimum latency, in cycle counts, for a write operation. This is the duration between `WVALID` and
+  `WLAST`, with `BVALID` being deasserted. The default value is `50`.
+
+- `PULSE_ON`: The number of cycles where addresses are let through. The default value is `5100`.
+
+- `PULSE_OFF`: The number of cycles where addresses are blocked. The default value is `5100`.
+
+- `BWCAP`: Maximum number of 64-bit words transferred per pulse cycle. A pulse cycle is defined by `PULSE_ON`
+  and `PULSE_OFF`. `0` is inferred as infinite and the default value is `625`.
+
+  > **Note:** The bandwidth cap `BWCAP` operates on the transaction level and, because of its simple implementation,
+  > the accuracy is limited.
+  > When set to a small value it allows only a small number of transactions for each pulse cycle.
+  > Once the counter has reached or exceeded the configured cap, no transactions will be allowed before the next pulse
+  > cycle. In order to minimize this effect some possible solutions are:
+  >
+  > - scale up all the parameters to a reasonably large value.
+  > - scale up `BWCAP` as a multiple of the burst length (in this case bulk traffic will not face rounding errors in
+  >   the bandwidth cap).
+
+- `MODE`: Timing adapter operation mode. Default value is `0`.
+
+  - `Bit 0`: `0`=simple, `1`=latency-deadline QoS throttling of read versus write,
+
+  - `Bit 1`: `1`=enable random AR reordering (`0`=default),
+
+  - `Bit 2`: `1`=enable random R reordering (`0`=default),
+
+  - `Bit 3`: `1`=enable random B reordering (`0`=default)
+
+For the CMake build configuration of the timing adapter, the SRAM AXI is assigned `index 0` and the flash, or DRAM, AXI
+bus has `index 1`.
+
+To change the bus parameter for the build a "***TA_\<index>_*"** prefix should be added to the above. For example,
+**TA0_MAXR=10** sets the maximum pending reads to 10 on the SRAM AXI bus.
+
+As an example, if we have the following parameters for the flash, or DRAM, region:
+
+- `TA1_MAXR` = "2"
+
+- `TA1_MAXW` = "0"
+
+- `TA1_MAXRW` = "0"
+
+- `TA1_RLATENCY` = "64"
+
+- `TA1_WLATENCY` = "32"
+
+- `TA1_PULSE_ON` = "320"
+
+- `TA1_PULSE_OFF` = "80"
+
+- `TA1_BWCAP` = "50"
+
+For a clock rate of 500MHz, this would translate to:
+
+- The maximum duty cycle for any operation is:\
+  ![Maximum duty cycle formula](../media/F1.png)
+
+- Maximum bit rate for this bus (64-bit wide) is:\
+  ![Maximum bit rate formula](../media/F2.png)
+
+- With a read latency of 64 cycles, and maximum pending reads as 2, each read could be a maximum of 64 or 128 bytes. As
+  defined for the *Ethos-U* NPU AXI bus attribute.
+
+  The bandwidth is calculated solely by read parameters:
+
+  ![Bandwidth formula](../media/F3.png)
+
+  This is higher than the overall bandwidth dictated by the bus parameters of:
+
+  ![Overall bandwidth formula](../media/F4.png)
+
+This suggests that the read operation is only limited by the overall bus bandwidth.
+
+Timing adapter requires recompilation to change parameters. Default timing adapter configuration file pointed to by
+`TA_CONFIG_FILE` build parameter is located in the `scripts/cmake folder` and contains all options for `AXI0` and `AXI1`
+as previously described.
+
+here is an example of `scripts/cmake/timing_adapter/ta_config_u55_high_end.cmake`:
+
+```cmake
+# Timing adapter options
+set(TA_INTERACTIVE OFF)
+
+# Timing adapter settings for AXI0
+set(TA0_MAXR "8")
+set(TA0_MAXW "8")
+set(TA0_MAXRW "0")
+set(TA0_RLATENCY "32")
+set(TA0_WLATENCY "32")
+set(TA0_PULSE_ON "3999")
+set(TA0_PULSE_OFF "1")
+set(TA0_BWCAP "4000")
+...
+```
+
+An example of the build with a custom timing adapter configuration:
+
+```commandline
+cmake .. -DTA_CONFIG_FILE=scripts/cmake/timing_adapter/my_ta_config.cmake
+```
+## Differences between timing adapter implementations in Arm® Corstone™-300 and Arm® Corstone™-310
+
+Corstone-300 FVP and FPGA implements timing adapters that are tied to AXI masters M0 and M1 on the Ethos-U NPU.
+
+Corstone-310 **FPGA** implements timing adapter blocks differently and those are placed on each of the main
+memories present on FPGA: SRAM, QSPI flash, DDR and user memory.
+Moreover, this timer adapter placement does not translate well to FVP, so current Corstone-310 FVP implementation does
+not support the feature. Additionally - base addresses of timer adapters blocks have changed for Corestone-310:
+
+#### Timer Adapters for Corstone-300 FVP and FPGA:
+| TA# | Interface TA is placed on | Base address (non-secure/secure) | Size  |
+|-----|---------------------------|----------------------------------|-------|
+| 0   | M0/AXI0 for Ethos-U NPU   | 0x4810_3000/0x5810_3000          | 0.5KB |
+| 1   | M1/AXI1 for Ethos-U NPU   | 0x4810_3200/0x5810_3200          | 0.5KB |
+#### Timer Adapter for Corstone-310 FPGA:
+| TA# | Interface TA is placed on | Base address (non-secure/secure) | Size |
+|-----|---------------------------|----------------------------------|------|
+| 0   | FPGA SRAM                 | 0x4170_0000/0x5170_0000          | 4KB  |
+| 1   | QSPI flash device         | 0x4170_1000/0x5170_1000          | 4KB  |
+| 2   | DDR                       | 0x4170_1000/0x5170_2000          | 4KB  |
+| 3   | User memory               | 0x4170_3000/0x5170_3000          | 4KB  |
+
+With this in mind, when targeting Corstone-310, evaluation kit should be built with timing adapters disabled altogether via `-DETHOS_U_NPU_TIMING_ADAPTER_ENABLED=OFF` flag. Because timing adapters do not affect CPU-driven traffic for Corstone-300, building both platforms without the support for timing adapters allows for a CPU performance comparison.
\ No newline at end of file