Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 1 | // |
| 2 | // This confidential and proprietary software may be used only as |
| 3 | // authorised by a licensing agreement from ARM Limited |
Eric Kunze | 18acfe3 | 2024-01-03 10:55:00 -0800 | [diff] [blame] | 4 | // (C) COPYRIGHT 2020-2024 ARM Limited |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 5 | // ALL RIGHTS RESERVED |
| 6 | // The entire notice above must be reproduced on all authorised |
| 7 | // copies and copies may only be made to the extent permitted |
| 8 | // by a licensing agreement from ARM Limited. |
| 9 | |
| 10 | == Introduction |
| 11 | |
| 12 | === Overview |
| 13 | |
Eric Kunze | fa1b324 | 2020-11-09 13:53:23 -0800 | [diff] [blame] | 14 | Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor |
| 15 | operations commonly employed by Deep Neural Networks. The intent is to enable a |
| 16 | variety of implementations running on a diverse range of processors, with the |
| 17 | results at the TOSA level consistent across those implementations. Applications |
| 18 | or frameworks which target TOSA can therefore be deployed on a wide range of |
| 19 | different processors, such as SIMD CPUs, GPUs and custom hardware such as |
| 20 | NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators |
| 21 | from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible |
| 22 | in TOSA. It is expected that there will be tools to lower from ML frameworks |
| 23 | into TOSA. |
| 24 | |
| 25 | === Goals |
| 26 | |
| 27 | The goals of TOSA include the following: |
| 28 | |
| 29 | * A minimal and stable set of tensor-level operators to which machine learning |
| 30 | framework operators can be reduced. |
| 31 | |
| 32 | * Full support for both quantized integer and floating-point content. |
| 33 | |
| 34 | * Precise functional description of the behavior of every operator, including |
| 35 | the treatment of their numerical behavior in the case of precision, saturation, |
| 36 | scaling, and range as required by quantized datatypes. |
| 37 | |
| 38 | * Agnostic to any single high-level framework, compiler backend stack or |
| 39 | particular target. |
| 40 | |
| 41 | * The detailed functional and numerical description enables precise code |
| 42 | construction for a diverse range of targets – SIMD CPUs, GPUs and custom |
| 43 | hardware such as NPUs/TPUs. |
| 44 | |
| 45 | === Specification |
| 46 | |
| 47 | The TOSA Specification is written as AsciiDoc mark-up and developed in its raw |
| 48 | mark-up form, managed through a git repository here: |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 49 | https://git.mlplatform.org/tosa/specification.git/. |
| 50 | The specification is developed and versioned much like software. |
| 51 | While the mark-up is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML. |
| 52 | To do this, please follow the instructions in the README.md in the root of the specification repository. |
| 53 | |
| 54 | === Operator Selection Principles |
| 55 | |
| 56 | TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. |
| 57 | To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. |
| 58 | The following principles govern the selection of operators within TOSA. |
| 59 | |
| 60 | .Principles |
| 61 | [cols="1,5,5"] |
| 62 | |=== |
| 63 | |ID|Principle|Reason for this |
| 64 | |
| 65 | |P0 |
| 66 | |An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations. |
| 67 | |If the operator can be broken down, then we should look at the component operators. |
| 68 | |
| 69 | |P1 |
| 70 | |An operator shall be a usable as a component out of which more complex operations can be constructed. |
| 71 | |Single use operators have a high architectural cost and a more reusable version should be considered instead. |
| 72 | |
| 73 | |P2 |
| 74 | |Precision should be appropriate for the input and output data types. |
| 75 | |Precision higher than that needed to calculate the result leads to extra implementation cost. |
| 76 | |
| 77 | |P3 |
| 78 | |Numerical definition of common sub-operations should be consistent between operators (for example: value scaling). |
| 79 | |Consistent sub-operation definition reduces the operator implementation cost. |
| 80 | |
| 81 | |P4 |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 82 | |The valid input and output ranges for all arguments shall be specified. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 83 | |Ranges are required to make consistent (numerically agreeing) implementations possible. |
| 84 | |
| 85 | |P5 |
| 86 | |Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets. |
| 87 | |Reduces implementation cost and gives consistent inference results. |
| 88 | |=== |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 89 | |
| 90 | === Profiles |
| 91 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 92 | TOSA supports three profiles that enable efficient implementation on different classes of device. |
| 93 | The Base Inference profile is intended for embedded integer/fixed-point designs performing inference only. |
| 94 | The Main Inference profile is intended for general inference functionality including integer and floating-point data types. |
| 95 | The Main Training profile adds training operators in addition to inference operators. |
| 96 | This version of the specification covers the Base Inference and Main Inference profiles. |
| 97 | Main Training profile is expected in a later version of the specification. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 98 | The following table summarizes the three profiles: |
| 99 | |
| 100 | .Profiles |
| 101 | |=== |
| 102 | |Profile|Name|Integer Inference|Floating-point Inference|Training |
| 103 | |
| 104 | |Base Inference|TOSA-BI|Yes|No|No |
| 105 | |Main Inference|TOSA-MI|Yes|Yes|No |
| 106 | |Main Training|TOSA-MT|Yes|Yes|Yes |
| 107 | |=== |
| 108 | |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 109 | === Levels |
| 110 | |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 111 | A TOSA level defines operator argument ranges that an implementation shall support. |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 112 | This is distinct from a profile that defines the operations and data-types supported. |
| 113 | This version of the specification defines two TOSA levels: |
| 114 | |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 115 | * No level : allows the full range of arguments specified by the operations according to the operation data types. |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 116 | * Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K. |
| 117 | |
| 118 | Later versions of the specification may define additional levels. |
| 119 | The following table defines the value ranges for Level 1.0. |
| 120 | These ranges are checked using the LEVEL_CHECK() function with the operator descriptions. |
| 121 | |
| 122 | .Level maximums |
Kevin Petit | 211c5f5 | 2023-04-26 16:25:52 +0100 | [diff] [blame] | 123 | include::{generated}/levels.adoc[] |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 124 | |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 125 | === Status |
| 126 | |
| 127 | The TOSA specification is a work in progress. |
| 128 | |
| 129 | * The Base Inference profile should be considered to be near release quality, with conformance tests available. |
| 130 | * The Main Inference profile has most of the expected operators in place, but is still subject to change. |
| 131 | * The reference model and conformance tests do not yet support all of the floating point types that have been defined. |
| 132 | * There is not currently a conformance test suite available for Main Inference. |
| 133 | * Main Training profile is pre-alpha, significant work still needs to be done for the profile, and no conformance tests are available. |
| 134 | |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 135 | === Compliance |
| 136 | |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 137 | This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 138 | To be compliant an implementation must achieve the results and accuracy defined by this specification. |
| 139 | TOSA also defines a set of conformance tests. |
| 140 | A compliant implementation must pass the conformance tests. |
| 141 | The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests. |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 142 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 143 | ==== Base Inference Profile Compliance |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 144 | |
Eric Kunze | a3eded0 | 2021-12-13 15:40:04 -0800 | [diff] [blame] | 145 | The <<Operator Graphs>> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph. |
| 146 | This behavior is captured in the pseudo-code function tosa_execute_graph(). |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 147 | For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph: |
| 148 | |
| 149 | * tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon. |
| 150 | * tosa_error: The graph does not meet the specification and is recognised as an illegal graph. |
| 151 | * tosa_valid: The result is defined and predictable and the list of output tensors defines the result. |
| 152 | |
| 153 | An implementation is compliant to the TOSA Baseline Inference Profile if it matches the above results as follows: |
| 154 | |
| 155 | * For tosa_unpredictable, the implementation can return whatever result it chooses (including error) |
| 156 | * For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any) |
| 157 | * For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification. |
| 158 | |
| 159 | In terms of psuedo-code, if *graph* is a TOSA graph consisting of Baseline Inference Profile operators and *input_list* is a list of input tensors then the following test must pass. |
| 160 | |
| 161 | [source,c++] |
| 162 | ---- |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 163 | bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level_t level) { |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 164 | shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph)); |
| 165 | shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph)); |
Dominic Symes | 7b0f1c9 | 2023-07-20 14:26:38 +0100 | [diff] [blame] | 166 | tosa_graph_result = tosa_valid; // result starts as valid |
| 167 | tosa_nesting_depth = 0; // if/while nesting level |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 168 | tosa_execute_graph(graph, input_list, output_list_spec, level); |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 169 | if (tosa_graph_result == tosa_unpredictable) { |
| 170 | return true; // No requirement to match an unpredictable result |
| 171 | } |
| 172 | result_test = execute_implementation_under_test(graph, input_list, output_list_test); |
| 173 | if (tosa_graph_result == tosa_error) { |
| 174 | return result_test == tosa_error; // result must be an error |
| 175 | } |
| 176 | if (exact_tensor_match(output_list_spec, output_list_test)) { |
| 177 | // Predictable bit-exact value match required |
| 178 | return true; |
| 179 | } |
| 180 | return false; |
| 181 | } |
| 182 | ---- |
| 183 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 184 | ==== Main Inference Profile Compliance |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 185 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 186 | A Main Inference compliant implementation must satisfy the following: |
| 187 | |
| 188 | * The implementation must meet <<Base Inference Profile Compliance>> for all Base inference complaint graphs |
| 189 | * The implementation must support all Main Inference operations using the datatype fp32_t |
| 190 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 191 | * The implementation must support all Main Inference operations using the datatype fp16_t |
| 192 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 193 | ** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype |
| 194 | * The implementation must support all Main Inference operations using the datatype bf16_t |
| 195 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 196 | ** Note: These requirements allow bf16_t operations to be implemented using the fp32_t datatype |
| 197 | |
| 198 | As with <<Base Inference Profile Compliance>> the pseudo-code function tosa_execute_graph() can return one of three possible results. |
| 199 | A compliant implementation must satisfy the following: |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 200 | |
| 201 | * For a graph returning tosa_error the implementation must also return an error |
| 202 | * For a graph returning tosa_valid the implementation must execute the entire graph without error |
| 203 | * For a graph returning tosa_valid and consisting only of integer operators the results must match exactly |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 204 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 205 | ===== Main Inference precision requirements |
| 206 | |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 207 | In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following. |
| 208 | In the table _ulp_ means unit of the last place. |
| 209 | The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 210 | |
| 211 | NOTE: The error criteria in this section are at an early draft stage and are likely to change during conformance test development. |
| 212 | |
| 213 | The following criteria apply to all operations: |
| 214 | |
| 215 | * If any input is a NaN and the result is floating-point then the result must be a NaN |
| 216 | * If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false |
| 217 | * if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable |
| 218 | |
| 219 | [cols="1,3"] |
| 220 | |=== |
| 221 | | Operation | Accuracy bound |
| 222 | |
Eric Kunze | 0ae7fd6 | 2023-09-26 17:29:43 -0700 | [diff] [blame] | 223 | | <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, <<SELECT>>, <<REDUCE_MAX>>, <<REDUCE_MIN>>, <<CONST>>, <<IDENTITY>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 224 | | Non NaN results must be exact. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 225 | |
| 226 | | <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>> |
| 227 | | The result must be exact with: + |
| 228 | (1) The sign of the zero is ignored + |
| 229 | (2) Infinities of the same sign compare as equal |
| 230 | |
| 231 | | <<CONV2D>>, <<CONV3D>>, <<DEPTHWISE_CONV2D>>, <<FULLY_CONNECTED>>, <<MATMUL>>, <<TRANSPOSE_CONV2D>> |
| 232 | | Each output can be expressed as a dot product of two input vectors. + |
| 233 | The dot product must meet the <<Dot product accuracy requirements>> |
| 234 | |
| 235 | | <<FFT2D>>, <<RFFT2D>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 236 | | Each output can be expressed as a dot product of an input vector with a constant coefficient vector. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 237 | The dot product must meet the <<Dot product accuracy requirements>> |
| 238 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 239 | | <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>> |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 240 | | Floating-point result overflows must be set to infinity of the correct sign. + |
| 241 | Floating-point result underflows must be set to zero of the correct sign. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 242 | Addition of infinites of different signs must produce a NaN. + |
| 243 | Subtraction of infinities of the same sign must produce a NaN. + |
| 244 | Multiplication of an infinity by a zero must produce a NaN. + |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 245 | Otherwise the result must be within 0.5 ulp of the mathematical result. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 246 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 247 | | <<CAST>> |
| 248 | | Floating-point result overflows must be set to infinity of the correct sign. + |
| 249 | Floating-point result underflows must be set to zero of the correct sign. + |
| 250 | Cast from floating-point to integer result overflows must be saturated. + |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 251 | Cast from floating-point to integer must be rounded using round to nearest, ties to even, rounding mode. + |
| 252 | Otherwise cast to floating-point must be within 0.5 ulp of the mathematical result. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 253 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 254 | | <<RECIPROCAL>> |
| 255 | | If the input is a zero or the result overlows the output must be an infinity of the same sign. + |
| 256 | If the input is an infinty or the result underflows the output must be a zero of the same sign. + |
| 257 | Otherwise:the result must be within 1 ulp of the mathematical result. |
| 258 | |
| 259 | | <<RSQRT>> |
| 260 | | If the input is less than zero the result must be a NaN. + |
| 261 | Otherwise if the input is a zero the output must be an infinity of the same sign. + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 262 | Otherwise the result must be within 2 ulp of the mathematical result. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 263 | |
Dominic Symes | 2bc6c57 | 2023-11-30 10:56:33 +0000 | [diff] [blame] | 264 | | <<LOG>>, <<ERF>> |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 265 | | If the input to LOG is less than zero then the result must be a NaN. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 266 | If the result overflows the output must be an infinity of the correct sign. + |
| 267 | If the result underflows the output must be a zero of the correct sign. + |
| 268 | Otherwise the result must be within 5 ulp of the mathematical result. |
| 269 | |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 270 | | <<EXP>> |
| 271 | | Let `x` be an input element and `out_imp` the implementation output of `exp(x)`. + |
| 272 | Let `out_ref` be the result of the fp64_t reference implementation of `exp(x)`. + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 273 | Let `err_bnd = abs(out_ref) * exp2(-normal_frac<in_out_t>) * (1+abs(x))` + |
| 274 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 275 | |
| 276 | | <<POW>> |
Eric Kunze | 18acfe3 | 2024-01-03 10:55:00 -0800 | [diff] [blame] | 277 | | Let `x`, `y` be input elements from `input1` and `input2` respectively. + |
| 278 | Let `out_imp` be the implementation output of `pow(x,y)`. + |
| 279 | If `x` is less than zero and `y` is non-integral then the result must be a NaN. + |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 280 | Let `out_ref` be the result of the fp64_t reference implementation of `pow(x,y)`. + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 281 | Let `err_bnd = abs(out_ref) * exp2(-normal_frac<in_out_t>) * (1+abs(log(abs(x))*y))` + |
| 282 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 283 | |
Dominic Symes | 8754ec2 | 2023-12-08 17:45:31 +0000 | [diff] [blame] | 284 | | <<SIGMOID>> |
Dominic Symes | 2bc6c57 | 2023-11-30 10:56:33 +0000 | [diff] [blame] | 285 | | Let `x` be an input element and `out_imp` the implementation output. + |
| 286 | Let `out_ref` be the result of the fp64_t reference implementation. + |
| 287 | Let `err_bnd = abs(out_ref) * exp2(-normal_frac<in_out_t>) * (2 * (1+abs(x)))` + |
| 288 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 289 | |
Dominic Symes | 8754ec2 | 2023-12-08 17:45:31 +0000 | [diff] [blame] | 290 | | <<TANH>> |
| 291 | | Let `x` be an input element and `out_imp` the implementation output. + |
| 292 | Let `out_ref` be the result of the fp64_t reference implementation. + |
| 293 | Let `err_bnd = exp2(-normal_frac<in_out_t>) * max(0.5, abs(out_ref) * (4 * (1+abs(x))))` + |
| 294 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 295 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 296 | | <<REDUCE_SUM>> |
| 297 | | Each output can be expressed as a dot product of an input vector with a vector of ones. + |
| 298 | This dot product must meet the <<Dot product accuracy requirements>> |
| 299 | |
| 300 | | <<AVG_POOL2D>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 301 | | Each output can be expressed as a dot product of an input vector with a vector with elements 1/KS where KS is the kernel size. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 302 | This dot product must meet the <<Dot product accuracy requirements>> |
| 303 | |
| 304 | | <<REDUCE_PRODUCT>> |
| 305 | | Result overflows must be set to an infinity of the correct sign. + |
| 306 | Result underflows must be set to a zero of the correct sign. + |
Dominic Symes | 83e79b5 | 2024-01-08 10:45:47 +0000 | [diff] [blame] | 307 | Let n be number of elements in the product, out_imp the implementation result, and out_ref the result of the fp64_t reference implementation. + |
| 308 | Let `err_bnd = abs(out_ref) * (pow(1 + pow(2, -normal_frac<in_out_t> - 1), n) - 1)` + |
| 309 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 310 | |
| 311 | |=== |
| 312 | |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 313 | ===== Operator sequence precision requirement |
| 314 | |
| 315 | Precision criteria are specified for a single operator. |
| 316 | |
| 317 | An implementation M of a sequence of n TOSA operators, A[0] to A[n-1] is said to |
| 318 | be compliant if M gives the same result as a sequence of implementations |
| 319 | M[0] to M[n-1] such that: |
| 320 | |
| 321 | * Each M[k] implements A[k] with same or higher precision datatypes |
| 322 | * Each M[k] meets the accuracy defined in this specification for A[k] where the M[k] output is converted to A[k] output precision using round to nearest |
| 323 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 324 | ===== Dot product accuracy requirements |
| 325 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 326 | This section assumes an operation acting on tensors named 'input', 'weight' and optionally 'bias'. |
| 327 | Each output tensor element can be expressed as a dot product of elements between the 'input' and 'weight' tensors with optional bias addition. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 328 | The dot product has length KS, the kernel size. |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 329 | If the operation does not specify a bias then 'bias' is taken to be zero in this section. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 330 | Note: KS is defined for each relevant operator in the appendix section <<Main Inference operator test data>>. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 331 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 332 | In other words, each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`: |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 333 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 334 | `out = in[0] * w[0] + in[1] * w[1] + ... + in[KS-1] * w[KS-1] + b` |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 335 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 336 | The positions of `in[k]`, `w[k]`, `b` in the input, weight and bias tensors depends on the operation being performed. |
| 337 | This may be, for example, a convolution. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 338 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 339 | This section defines the accuracy required for these operations. |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 340 | In this section: |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 341 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 342 | * "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by IEEE 754 (<<Other publications>>[1]) |
| 343 | * `operation_fp64()` is an fp64 reference implementation of the operation |
| 344 | * `operation_imp()` is the implementation under test |
| 345 | * `local_bound` is defined as follows: |
| 346 | ** For operations with a local_bound attribute it is the value of the optional attribute, with default value of false |
| 347 | ** For operations that do not have a local_bound attribute the value is true |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 348 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 349 | The checks described in the following code must pass for the following data sets: |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 350 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 351 | * Data sets defined for the operation in Appendix A <<Main Inference operator test data>>. |
| 352 | * Data sets that have at least MIN_DOT_PRODUCT different output values. For these data sets we take S=-1. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 353 | |
| 354 | [source,c++] |
| 355 | ---- |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 356 | output_ref = operation_fp64(input, weight, bias); |
| 357 | output_imp = operation_imp (input, weight, bias); |
| 358 | input_abs = abs(input); // Element-wise absolute |
| 359 | weight_abs = abs(weight); // Element-wise absolute |
| 360 | bias_abs = abs(bias); // Element-wise absolute |
| 361 | if (!local_bound) { |
| 362 | input_abs_max = max_value(input_abs); // maximum over all elements |
| 363 | for_each(index in shape(input_abs) { |
| 364 | input_abs[index] = input_abs_max; // set all entries to global maximum |
| 365 | } |
| 366 | } |
| 367 | output_bnd = operation_fp64(input_abs, weight_abs, bias_abs); |
| 368 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 369 | size_t T = tensor_size(output_shape) // number dot product results |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 370 | size_t ksb = (max_value(bias_abs) > 0) ? (KS + 1) : KS; // kernel size and bias |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 371 | fp64_t out_err_sum = 0.0; |
| 372 | fp64_t out_err_sumsq = 0.0; |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 373 | for_each(index in output_shape) { |
| 374 | fp64_t out_bnd = tensor_read<fp64_t>(output_bnd, output_shape, index); |
| 375 | fp64_t out_ref = tensor_read<fp64_t>(output_ref, output_shape, index); |
| 376 | acc_t out_imp = tensor_read<acc_t> (output_imp, output_shape, index); |
| 377 | fp64_t out_err; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 378 | if ((acc_t)out_bnd == infinity) { |
| 379 | // dot product can overflow and there is no accuracy limit |
| 380 | out_err = 0.0; |
| 381 | } else if (out_bnd == 0.0) { |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 382 | REQUIRE(out_ref == 0.0 && out_imp == 0.0); |
| 383 | out_err = 0.0; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 384 | } else { // 0.0 < out_bnd < infinity |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 385 | fp64_t out_err_bnd = max(out_bnd * exp2(-1-normal_frac<acc_t>()), normal_min<acc_t>()); |
Dominic Symes | b203512 | 2023-09-01 11:41:08 +0100 | [diff] [blame] | 386 | out_err = (static_cast<fp64_t>(out_imp) - out_ref) / out_err_bnd; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 387 | REQUIRE(abs(out_err) <= ksb); |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 388 | } |
| 389 | out_err_sum += out_err; |
| 390 | out_err_sumsq += out_err * out_err; |
| 391 | } |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 392 | if (input and weights are data set S with 3 <= S <= 5) { |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 393 | // check output error bias magnitude for data sets S which are not positive biased |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 394 | REQUIRE(abs(out_err_sum) <= 2*sqrt(ksb*T)); |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 395 | } |
| 396 | // check output error variance magnitude |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 397 | REQUIRE(out_err_sumsq <= 0.4*ksb*T) |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 398 | ---- |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 399 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 400 | === Tensor Definitions |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 401 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 402 | ==== Tensors |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 403 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 404 | Tensors are multidimensional arrays of data. |
| 405 | Tensors have metadata associated with them that describe characteristics of the tensor, including: |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 406 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 407 | * Data Type |
| 408 | * Shape |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 409 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 410 | The number of dimensions in a shape is called the rank. |
| 411 | A tensor with rank equal to zero is permitted. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 412 | In that case, the tensor has a single entry and is also known as a scalar. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 413 | A tensor shape is an array of integers of size equal to the rank of the tensor. |
| 414 | Each element in the tensor shape describes the number of elements in the dimension. |
| 415 | The tensor shape in each dimension must be greater than or equal to 1. |
| 416 | For tensor access information, see <<Tensor Access Helpers>>. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 417 | |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame^] | 418 | The shape of a tensor of non-zero rank is a special type shape_t. |
| 419 | shape_t is a one-dimensional list with the size equal to the rank of the original tensor. |
| 420 | The components of a shape_t are of type size_t. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 421 | |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 422 | In this version of the specification, shape_t values must be resolvable to constants at backend compile time. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 423 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 424 | ==== Tensor size limit |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 425 | |
Dominic Symes | bc72ba8 | 2023-04-24 17:08:02 +0100 | [diff] [blame] | 426 | The tensor overall size is limited by the data type size_t. |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame^] | 427 | This type must be able to hold integers in the range 0 to (1 << (MAX_LOG2_SIZE + 1)) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>. |
| 428 | For each tensor, the number of tensor elements multiplied by the element size in bytes (which is taken to be 1 for elements smaller than a 8-bit) must be less than or equal to (1 << (MAX_LOG2_SIZE + 1)) - 1. |
Dominic Symes | bc72ba8 | 2023-04-24 17:08:02 +0100 | [diff] [blame] | 429 | |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame^] | 430 | The size of tensors along each of their dimensions is limited by the data type size_t. |
| 431 | |
| 432 | This means that the maximum size of a tensor along each dimension is (1 << MAX_LOG2_SIZE) - 1 and therefore the maximum coordinate value is (1 << MAX_LOG2_SIZE) - 2. |
Dominic Symes | 0205d99 | 2022-10-07 15:03:01 +0100 | [diff] [blame] | 433 | Indices used to access tensors must be non-negative. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 434 | |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 435 | |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 436 | ==== Data Layouts |
| 437 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 438 | The following data layouts are supported in TOSA. |
| 439 | TOSA operations are defined in terms of a linear packed tensor layout. |
| 440 | In a linear packed layout a rank r tensor has elements of dimension (r-1) consecutive. |
| 441 | The next to increment is dimension (r-2) and so on. |
| 442 | For a specification of this layout see the tensor read and write functions in section <<Tensor Access Helpers>>. |
| 443 | |
| 444 | An implementation of TOSA can choose a different tensor memory layout provided that the operation behavior is maintained. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 445 | |
| 446 | .Data Layouts |
| 447 | [cols="1,4,4"] |
| 448 | |=== |
| 449 | |Name|Description of dimensions|Usage |
| 450 | |
| 451 | |NHWC|Batch, Height, Width, Channels|Feature maps |
| 452 | |NDHWC|Batch, Depth, Height, Width, Channels|Feature maps for 3D convolution |
| 453 | |OHWI|Output channels, Filter Height, Filter Width, Input channels|Weights |
| 454 | |HWIM|Filter Height, Filter Width, Input channels, Channel Multiplier|Weights for depthwise convolutions |
| 455 | |DOHWI|Depth, Output Channels, Filter Height, Filter Width, Input Channels|Weights for 3D convolution |
| 456 | |=== |
| 457 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 458 | ==== Broadcasting |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 459 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 460 | In operations where broadcasting is supported, an input shape dimension can be broadcast to an output shape dimension if the input shape dimension is 1. |
| 461 | TOSA broadcast requires the rank of both tensors to be the same. |
| 462 | A RESHAPE can be done to create a compatible tensor with appropriate dimensions of size 1. |
| 463 | To map indexes in an output tensor to that of an input tensor, see <<Broadcast Helper>>. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 464 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 465 | ==== Supported Number Formats |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 466 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 467 | The following number formats are defined in TOSA. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 468 | The number formats supported by a given operator are listed in its table of supported types. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 469 | |
| 470 | .Number formats |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 471 | [cols="1,1,1,5"] |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 472 | |=== |
| 473 | |Format|Minimum|Maximum|Description |
| 474 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 475 | |bool_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 476 | | - |
| 477 | | - |
Kevin Petit | f9fcb61 | 2024-01-23 19:09:29 +0000 | [diff] [blame] | 478 | |Boolean value that is either `true` or `false`. Size implementation defined. The TOSA reference model implements this as int8_t with 0 for `false` and 1 for `true`. All non-zero values are accepted on input as `true`. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 479 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 480 | |i4_t |
| 481 | | - |
| 482 | | - |
| 483 | |Signless 4-bit integer type. Will be interpreted as int4_t by all operators |
| 484 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 485 | |int4_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 486 | | -7 |
| 487 | | +7 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 488 | |Signed 4-bit two's-complement value. Excludes -8 to maintain a symmetric about zero range for weights. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 489 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 490 | |i8_t |
| 491 | | - |
| 492 | | - |
| 493 | |Signless 8-bit integer value. Will be interpreted as int8_t unless otherwise specified by an operator. |
| 494 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 495 | |int8_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 496 | | -128 |
| 497 | | +127 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 498 | |Signed 8-bit two's-complement value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 499 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 500 | |uint8_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 501 | | 0 |
| 502 | | 255 |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 503 | |Unsigned 8-bit integer value. |
| 504 | |
| 505 | |i16_t |
| 506 | | - |
| 507 | | - |
| 508 | |Signless 16-bit integer type. Will be interpreted as int16_t unless otherwise specified by an operator. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 509 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 510 | |int16_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 511 | | -32768 |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 512 | | +32767 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 513 | |Signed 16-bit two's-complement value. |
| 514 | |
| 515 | |uint16_t |
| 516 | | 0 |
| 517 | | 65535 |
| 518 | |Unsigned 16-bit value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 519 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 520 | |i32_t |
| 521 | | - |
| 522 | | - |
| 523 | |Signless 32-bit integer value. Will be interpreted as int32_t by all operators. |
| 524 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 525 | |int32_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 526 | | -(1<<31) |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 527 | | (1<<31)-1 |
Eric Kunze | 173fc16 | 2021-08-17 14:57:46 -0700 | [diff] [blame] | 528 | |Signed 32-bit two's-complement value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 529 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 530 | |i48_t |
| 531 | | - |
| 532 | | - |
Eric Kunze | 2f3f4a2 | 2024-01-08 14:22:11 -0800 | [diff] [blame] | 533 | |Signless 48-bit integer value. Will be interpreted as int48_t by all operators. |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 534 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 535 | |int48_t |
Eric Kunze | 57e79c0 | 2020-11-03 11:23:09 -0800 | [diff] [blame] | 536 | | -(1<<47) |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 537 | | (1<<47)-1 |
Eric Kunze | 173fc16 | 2021-08-17 14:57:46 -0700 | [diff] [blame] | 538 | |Signed 48-bit two's-complement value. |
Eric Kunze | 57e79c0 | 2020-11-03 11:23:09 -0800 | [diff] [blame] | 539 | |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 540 | |fp16_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 541 | | -infinity |
| 542 | | +infinity |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 543 | | 16-bit half-precision floating-point defined by <<Other publications>>[1]. + |
| 544 | Normal values must be supported. + |
| 545 | Denormal values must either be supported or flushed to zero. + |
| 546 | Positive and negative infinity must be supported. + |
| 547 | At least one NaN encoding must be supported. + |
| 548 | Signed zero must be supported. |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 549 | |
| 550 | |bf16_t |
| 551 | | -infinity |
| 552 | | +infinity |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 553 | | 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. + |
| 554 | Normal values must be supported. + |
| 555 | Denormal values must either be supported or flushed to zero. + |
| 556 | Positive and negative infinity must be supported. + |
| 557 | At least one NaN encoding must be supported. + |
| 558 | Signed zero must be supported. |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 559 | |
| 560 | |fp32_t |
| 561 | | -infinity |
| 562 | | +infinity |
Eric Kunze | 277a4f1 | 2023-05-12 17:50:19 -0700 | [diff] [blame] | 563 | | 32-bit single-precision floating-point defined by <<Other publications>>[1]. + |
| 564 | Normal values must be supported. + |
| 565 | Denormal values must either be supported or flushed to zero. + |
| 566 | Positive and negative infinity must be supported. + |
| 567 | At least one NaN encoding must be supported. + |
| 568 | Signed zero must be supported. |
| 569 | |
| 570 | |fp64_t |
| 571 | | -infinity |
| 572 | | + infinity |
| 573 | | 64-bit double-precision floating-point defined by <<Other publications>>[1]. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 574 | Normal values must be supported. + |
| 575 | Denormal values must either be supported or flushed to zero. + |
| 576 | Positive and negative infinity must be supported. + |
| 577 | At least one NaN encoding must be supported. + |
| 578 | Signed zero must be supported. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 579 | |=== |
| 580 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 581 | Note: In this specification minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point). |
| 582 | The minimum and maximum values for each type is given in the preceeding table. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 583 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 584 | Note: Integer number formats smaller than 8 bits may be used provided that the numerical result is the same as using a sequence of 8-bit TOSA operations. |
| 585 | For example, a convolution with low precision data must equal that of running the convolution at 8 bits and then clipping the result to the peritted output range. |
| 586 | This ensures that a Base Inference profile TOSA implementation can calculate the same result. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 587 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 588 | === Integer Behavior |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 589 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 590 | TOSA integer inputs and outputs are specified by signless values with the given number of bits. |
| 591 | Unless otherwise specified, these values will be interpreted as signed twos-complement. |
| 592 | The pseudocode will use int*_t to indicate use as a signed value and uint*_t to indicate use as an unsigned value. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 593 | If overflow occurs doing integer calculation, the result is unpredictable, as indicated by the REQUIRE checks in the pseudocode for the operators. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 594 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 595 | Unsigned 8 and 16-bit values are only allowed in the RESCALE operation, to allow for compatibility with networks which expect unsigned 8-bit or 16-bit tensors for input and output. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 596 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 597 | ==== Quantization |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 598 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 599 | Machine Learning frameworks may represent tensors with a quantized implementation, using integer values to represent the original floating-point numbers. |
| 600 | TOSA integer operations do not perform any implicit scaling to represent quantized values. |
| 601 | Required zero point values are passed to the operator as necessary, and will be processed according to the pseudocode for each operator. |
Eric Kunze | c949f8a | 2021-09-16 14:51:26 -0700 | [diff] [blame] | 602 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 603 | To convert a network containing quantized tensors to TOSA, generate explicit RESCALE operators for any change of quantization scaling. |
| 604 | This reduces quantized operations to purely integer operations. |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 605 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 606 | As an example, an ADD between two quantized tensors requires the integer values represent the same range. |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 607 | The scale arguments for RESCALE can be calculated to ensure that the resulting tensors represent the same range. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 608 | Then the ADD is performed, and a RESCALE can be used to ensure that the result is scaled properly. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 609 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 610 | RESCALE provides support for per-tensor and per-channel scaling values to ensure compatibility with a range of possible quantization implementations. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 611 | |
Eric Kunze | c949f8a | 2021-09-16 14:51:26 -0700 | [diff] [blame] | 612 | |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 613 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 614 | ==== Precision scaling |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 615 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 616 | TOSA uses the RESCALE operation to scale between values with differing precision. |
| 617 | The RESCALE operator is defined using an integer multiply, add, and shift. |
| 618 | This guarantees that all TOSA implementations will return the same result for a RESCALE, including those with no support for floating-point numbers. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 619 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 620 | This TOSA specification supports two precisions of multiplier: 16-bit and 32-bit. |
| 621 | The 32-bit multiplier version supports two rounding modes to enable simpler lowering of existing frameworks that use two stage rounding. |
| 622 | All arithmetic is designed so that it does not overflow a 64-bit accumulator and that the final result fits in 32 bits. |
| 623 | In particular a 48-bit value can only be scaled with the 16-bit multiplier. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 624 | |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 625 | The apply_scale functions provide a scaling of approximately (multiplier * 2^-shift^). |
| 626 | The shift and value range is limited to allow a variety of implementations. |
| 627 | The limit of 62 on shift allows the shift to be decomposed as two right shifts of 31. |
Eric Kunze | ce6e02c | 2022-03-11 15:12:38 -0800 | [diff] [blame] | 628 | The limit on value allows implementations that left shift the value before the multiply in the case of shifts of 32 or less. |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 629 | For example, in the case shift=30 an implementation of the form ((value\<<2) * multiplier + round)>>32 can be used. |
| 630 | A scaling range of 2^+12^ down to 2^-32^ is supported for both functions with a normalized multiplier. |
| 631 | |
| 632 | For example, in typical usage a scaling of m*2^-n^ where m is a fraction in the |
| 633 | range 1.0 \<= m < 2.0 can be represented using multiplier=(1<<30)*m, shift=(30+n) for |
| 634 | apply_scale_32() and multiplier=(1<<14)*m, shift=(14+n) for apply_scale_16(). |
| 635 | The values to achieve a scaling of 1.0 are shift=30, multiplier=1<<30 for apply_scale_32 and shift=14, multiplier=1<<14 for apply_scale_16. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 636 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 637 | [source,c++] |
| 638 | ---- |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 639 | int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t double_round=false) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 640 | REQUIRE(multiplier >= 0); |
| 641 | REQUIRE(2 <= shift && shift <= 62); |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 642 | REQUIRE(value >= (-1 << (shift - 1)) && value < (1 << (shift - 1))); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 643 | int64_t round = 1 << (shift - 1); |
| 644 | if (double_round) { |
| 645 | if (shift > 31 && value >= 0) round += 1<<30; |
| 646 | if (shift > 31 && value < 0) round -= 1<<30; |
| 647 | } |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 648 | int64_t result = static_cast<int64_t>(value) * multiplier + round; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 649 | result = result >> shift; |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 650 | // result will fit a 32-bit range due to the REQUIRE on value |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 651 | return static_cast<int32_t>(result); |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 652 | } |
| 653 | |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 654 | int32_t apply_scale_16(int48_t value, int16_t multipler, int8_t shift) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 655 | REQUIRE(multiplier >= 0); |
| 656 | REQUIRE(2 <= shift && shift <= 62); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 657 | int64_t round = (1 << (shift - 1)); |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 658 | int64_t result = static_cast<int64_t>(value) * multiplier + round; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 659 | result = result >> shift; |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 660 | REQUIRE(result >= minimum<int32_t> && result <= maximum<int32_t>); |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 661 | return static_cast<int32_t>(result); |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 662 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 663 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 664 | |
| 665 | In some functions, the multiplier and shift are combined into a scale_t structure: |
| 666 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 667 | [source,c++] |
| 668 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 669 | typedef struct { |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 670 | int32_t multiplier; |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 671 | int8_t shift; |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 672 | } scale_t; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 673 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 674 | |
| 675 | In places where a divide is required, we also use the function below to calculate an appropriate scaling value. |
| 676 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 677 | [source,c++] |
| 678 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 679 | scale_t reciprocal_scale(uint32_t value) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 680 | REQUIRE(value > 0); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 681 | scale_t scale; |
Dominic Symes | cb6c6b3 | 2022-04-29 16:15:56 +0100 | [diff] [blame] | 682 | int32_t k = 32 - count_leading_zeros(value - 1); // (1 << k) / 2 < value <= (1 << k) |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 683 | int64_t numerator = ((1 << 30) + 1) << k; |
| 684 | scale.multiplier = numerator / value; // (1 << 30) <= multiplier < (1 << 31) |
| 685 | scale.shift = 30 + k; |
| 686 | return scale; |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 687 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 688 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 689 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 690 | ==== Integer Convolutions |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 691 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 692 | For the convolution operators, the input is not required to be scaled. |
| 693 | The integer versions of the convolution operators will subtract the zero point from the integer values as defined for each operator. |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 694 | The convolution produces an accumulator output of type int32_t or int48_t. |
| 695 | This accumulator output is then scaled to the final output range using the RESCALE operator. |
| 696 | The scale applied in the RESCALE operator should be set to multiplier and shift values such that: multiplier * 2^-shift^ = (input scale * weight scale) / output_scale. |
| 697 | Here, input_scale, weight_scale and output_scale are the conversion factors from integer to floating-point for the input, weight and output tensor values respectively. |
| 698 | If per-channel scaling is needed then the per-channel option of the RESCALE operation should be used. |
| 699 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 700 | ==== Integer Elementwise Operators |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 701 | |
| 702 | When two quantized tensors are used in an operation, they must represent the same numeric range for the result to be valid. |
| 703 | In this case, TOSA expects that RESCALE operators will be used as necessary to generate 32-bit integer values in a common range. |
| 704 | There are many valid choices for scale factors and options for the common range. |
| 705 | TOSA does not impose a requirement on which scale factors and range should be used. |
| 706 | Compilers generating TOSA sequences should choose a range that allows the operation to be computed without overflow, while allowing the highest possible accuracy of the output. |
| 707 | |
| 708 | ==== General Unary Functions |
| 709 | General unary functions such as sigmoid(), tanh(), exp() for integer inputs are expressed using a lookup table and interpolation to enable efficient implementation. |
| 710 | This also allows for other operations with the addition of user-supplied tables (the TABLE operation). |
| 711 | All table lookups are based on the following reference lookup function that takes as input a table of 513 entries of 16 bits each. |
| 712 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 713 | [source,c++] |
| 714 | ---- |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 715 | int32_t apply_lookup_s(int16_t *table, int32_t value) |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 716 | { |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 717 | int16_t clipped_value = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767)); |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 718 | int32_t index = (clipped_value + 32768) >> 7; |
| 719 | int32_t fraction = clipped_value & 0x7f; |
| 720 | int16_t base = table[index]; |
| 721 | int16_t next = table[index+1]; |
Dominic Symes | 2ff79fe | 2022-01-27 15:44:26 +0000 | [diff] [blame] | 722 | int32_t slope = next - base; |
| 723 | REQUIRE(slope >= minimum<int16_t> && slope <= maximum<int16_t>) |
| 724 | int32_t return_value = (base << 7) + slope * fraction; |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 725 | return return_value; // return interpolated value of 16 + 7 = 23 bits |
| 726 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 727 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 728 | |
| 729 | Note that although the table lookup defined here has 16-bit precision, for 8-bit only operations an 8-bit table can be derived by applying the reference function to each of the possible 256 input values. |
| 730 | The following code constructs a 513-entry table based on a reference function. |
| 731 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 732 | [source,c++] |
| 733 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 734 | void generate_lookup_table(int16_t *table, int32_t (*reference)(int32_t)) |
| 735 | { |
| 736 | for (int i = -256; i <= 256; i++) { |
| 737 | int32_t value = (*reference)(i); |
Eric Kunze | 6085883 | 2024-01-22 16:54:29 -0800 | [diff] [blame] | 738 | table[i + 256] = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767)); |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 739 | } |
| 740 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 741 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 742 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 743 | === Other publications |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 744 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 745 | The following publications are referred to in this specification, or provide more information: |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 746 | |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 747 | . IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008. |