Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 1 | // |
| 2 | // This confidential and proprietary software may be used only as |
| 3 | // authorised by a licensing agreement from ARM Limited |
Eric Kunze | 18acfe3 | 2024-01-03 10:55:00 -0800 | [diff] [blame] | 4 | // (C) COPYRIGHT 2020-2024 ARM Limited |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 5 | // ALL RIGHTS RESERVED |
| 6 | // The entire notice above must be reproduced on all authorised |
| 7 | // copies and copies may only be made to the extent permitted |
| 8 | // by a licensing agreement from ARM Limited. |
| 9 | |
| 10 | == Introduction |
| 11 | |
| 12 | === Overview |
| 13 | |
Eric Kunze | fa1b324 | 2020-11-09 13:53:23 -0800 | [diff] [blame] | 14 | Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor |
| 15 | operations commonly employed by Deep Neural Networks. The intent is to enable a |
| 16 | variety of implementations running on a diverse range of processors, with the |
| 17 | results at the TOSA level consistent across those implementations. Applications |
| 18 | or frameworks which target TOSA can therefore be deployed on a wide range of |
| 19 | different processors, such as SIMD CPUs, GPUs and custom hardware such as |
| 20 | NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators |
| 21 | from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible |
| 22 | in TOSA. It is expected that there will be tools to lower from ML frameworks |
| 23 | into TOSA. |
| 24 | |
| 25 | === Goals |
| 26 | |
| 27 | The goals of TOSA include the following: |
| 28 | |
| 29 | * A minimal and stable set of tensor-level operators to which machine learning |
| 30 | framework operators can be reduced. |
| 31 | |
| 32 | * Full support for both quantized integer and floating-point content. |
| 33 | |
| 34 | * Precise functional description of the behavior of every operator, including |
| 35 | the treatment of their numerical behavior in the case of precision, saturation, |
| 36 | scaling, and range as required by quantized datatypes. |
| 37 | |
| 38 | * Agnostic to any single high-level framework, compiler backend stack or |
| 39 | particular target. |
| 40 | |
| 41 | * The detailed functional and numerical description enables precise code |
| 42 | construction for a diverse range of targets – SIMD CPUs, GPUs and custom |
| 43 | hardware such as NPUs/TPUs. |
| 44 | |
| 45 | === Specification |
| 46 | |
| 47 | The TOSA Specification is written as AsciiDoc mark-up and developed in its raw |
| 48 | mark-up form, managed through a git repository here: |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 49 | https://git.mlplatform.org/tosa/specification.git/. |
| 50 | The specification is developed and versioned much like software. |
| 51 | While the mark-up is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML. |
| 52 | To do this, please follow the instructions in the README.md in the root of the specification repository. |
| 53 | |
| 54 | === Operator Selection Principles |
| 55 | |
| 56 | TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way. |
| 57 | To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed. |
| 58 | The following principles govern the selection of operators within TOSA. |
| 59 | |
| 60 | .Principles |
| 61 | [cols="1,5,5"] |
| 62 | |=== |
| 63 | |ID|Principle|Reason for this |
| 64 | |
| 65 | |P0 |
| 66 | |An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations. |
| 67 | |If the operator can be broken down, then we should look at the component operators. |
| 68 | |
| 69 | |P1 |
| 70 | |An operator shall be a usable as a component out of which more complex operations can be constructed. |
| 71 | |Single use operators have a high architectural cost and a more reusable version should be considered instead. |
| 72 | |
| 73 | |P2 |
| 74 | |Precision should be appropriate for the input and output data types. |
| 75 | |Precision higher than that needed to calculate the result leads to extra implementation cost. |
| 76 | |
| 77 | |P3 |
| 78 | |Numerical definition of common sub-operations should be consistent between operators (for example: value scaling). |
| 79 | |Consistent sub-operation definition reduces the operator implementation cost. |
| 80 | |
| 81 | |P4 |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 82 | |The valid input and output ranges for all arguments shall be specified. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 83 | |Ranges are required to make consistent (numerically agreeing) implementations possible. |
| 84 | |
| 85 | |P5 |
| 86 | |Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets. |
| 87 | |Reduces implementation cost and gives consistent inference results. |
| 88 | |=== |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 89 | |
Eric Kunze | 618f66a | 2024-04-16 17:54:34 -0700 | [diff] [blame^] | 90 | === Versioning |
| 91 | |
| 92 | TOSA follows a semantic versioning policy with a major.minor.patch.draft scheme. |
| 93 | See below for the TOSA definition of backward compatibility. |
| 94 | |
| 95 | * Major version changes may break backwards compatibility. |
| 96 | * Minor numbers may add functionality in a backwards compatible way. |
| 97 | * Patch versions are for bug fixes, clarifications, or trivial changes. |
| 98 | * The draft flag notes whether the version referenced is finalized. |
| 99 | |
| 100 | Major, minor, and patch numbers are limited to eight bits. |
| 101 | Draft is a single bit flag. |
| 102 | If stored in a 32-bit value, the remaining bits are reserved for future use. |
| 103 | |
| 104 | ==== Backwards Compatibility |
| 105 | |
| 106 | TOSA graphs created with previous minor versions within a major version must continue to work. |
| 107 | The following portions of the specification and implementation will not change within a major version: |
| 108 | |
| 109 | * Operator Names |
| 110 | * Arguments including ordering, input/attribute/output, name, rank |
| 111 | * ERROR_IF statements |
| 112 | * Functionality of the pseudocode for each operator |
| 113 | * Level definitions and checks |
| 114 | * Supported Data Type tables |
| 115 | * Conformance test definitions |
| 116 | * Enumerated types and values |
| 117 | |
| 118 | Changes to the following do not break compatibility: |
| 119 | |
| 120 | * Order of operations within the XML |
| 121 | * Operator section names |
| 122 | * Descriptive text that does not affect functionality |
| 123 | * Non-functional changes to pseudocode (for example: cleanup, local variable name changes) |
| 124 | |
| 125 | Minor versions are allowed to add new operators or other functionality as long as the above guarantees hold. |
| 126 | |
| 127 | In addition, new extensions may be added to the specification between TOSA releases. |
| 128 | They may not change anything that would break backward compatibility according to the above definitions. |
| 129 | |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 130 | === Profiles |
| 131 | |
Eric Kunze | 6dd3410 | 2024-02-25 22:24:52 -0800 | [diff] [blame] | 132 | TOSA profiles enable efficient implementation on different classes of device. |
| 133 | Each profile is an independent set of operations and data type combinations. |
| 134 | |
| 135 | TOSA profile extensions define optional operation and data type combinations. |
| 136 | |
| 137 | Each operator's Supported Data Types table will define which profile or extension an operator and data type is in. |
| 138 | An operator / data type combination may be part of multiple profiles or extensions. |
| 139 | If so, each profile and extension will be listed in the Supported Data Types table. |
| 140 | In addition, a table listing all operations for each profile can be found in Appendix B. |
| 141 | |
| 142 | The following are required for compliant TOSA implementations: |
| 143 | |
| 144 | * A TOSA implementation must implement at least one profile. |
| 145 | * A TOSA implementation may choose to implement any extensions. |
| 146 | * If a TOSA implementation chooses to implement an extension, it must implement the complete extension. |
| 147 | * If a operator / data type combination requires multiple extensions, the combination is only required to be implemented if all extensions are implemented |
| 148 | ** For example, a CAST from bf16 to fp8 is only required if both extensions are implemented. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 149 | |
| 150 | .Profiles |
Eric Kunze | 6dd3410 | 2024-02-25 22:24:52 -0800 | [diff] [blame] | 151 | include::{generated}/profiles.adoc[] |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 152 | |
Eric Kunze | 6dd3410 | 2024-02-25 22:24:52 -0800 | [diff] [blame] | 153 | .Profile Extensions |
| 154 | include::{generated}/profile_extensions.adoc[] |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 155 | |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 156 | === Levels |
| 157 | |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 158 | A TOSA level defines operator argument ranges that an implementation shall support. |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 159 | This is distinct from a profile that defines the operations and data-types supported. |
Eric Kunze | 6dd3410 | 2024-02-25 22:24:52 -0800 | [diff] [blame] | 160 | One level must apply to all profiles and extensions supported by an implementation. |
| 161 | |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 162 | This version of the specification defines two TOSA levels: |
| 163 | |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 164 | * No level : allows the full range of arguments specified by the operations according to the operation data types. |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 165 | * Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K. |
| 166 | |
| 167 | Later versions of the specification may define additional levels. |
Eric Kunze | 0d7d001 | 2024-03-25 14:07:29 -0700 | [diff] [blame] | 168 | The following table defines the value ranges for each level. |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 169 | These ranges are checked using the LEVEL_CHECK() function with the operator descriptions. |
| 170 | |
| 171 | .Level maximums |
Kevin Petit | 211c5f5 | 2023-04-26 16:25:52 +0100 | [diff] [blame] | 172 | include::{generated}/levels.adoc[] |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 173 | |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 174 | === Status |
| 175 | |
| 176 | The TOSA specification is a work in progress. |
| 177 | |
| 178 | * The Base Inference profile should be considered to be near release quality, with conformance tests available. |
| 179 | * The Main Inference profile has most of the expected operators in place, but is still subject to change. |
| 180 | * The reference model and conformance tests do not yet support all of the floating point types that have been defined. |
| 181 | * There is not currently a conformance test suite available for Main Inference. |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 182 | |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 183 | === Compliance |
| 184 | |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 185 | This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 186 | To be compliant an implementation must achieve the results and accuracy defined by this specification. |
| 187 | TOSA also defines a set of conformance tests. |
| 188 | A compliant implementation must pass the conformance tests. |
| 189 | The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests. |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 190 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 191 | ==== Base Inference Profile Compliance |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 192 | |
Eric Kunze | a3eded0 | 2021-12-13 15:40:04 -0800 | [diff] [blame] | 193 | The <<Operator Graphs>> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph. |
| 194 | This behavior is captured in the pseudo-code function tosa_execute_graph(). |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 195 | For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph: |
| 196 | |
| 197 | * tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon. |
| 198 | * tosa_error: The graph does not meet the specification and is recognised as an illegal graph. |
| 199 | * tosa_valid: The result is defined and predictable and the list of output tensors defines the result. |
| 200 | |
| 201 | An implementation is compliant to the TOSA Baseline Inference Profile if it matches the above results as follows: |
| 202 | |
| 203 | * For tosa_unpredictable, the implementation can return whatever result it chooses (including error) |
| 204 | * For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any) |
| 205 | * For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification. |
| 206 | |
| 207 | In terms of psuedo-code, if *graph* is a TOSA graph consisting of Baseline Inference Profile operators and *input_list* is a list of input tensors then the following test must pass. |
| 208 | |
| 209 | [source,c++] |
| 210 | ---- |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 211 | bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level_t level) { |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 212 | shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph)); |
| 213 | shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph)); |
Dominic Symes | 7b0f1c9 | 2023-07-20 14:26:38 +0100 | [diff] [blame] | 214 | tosa_graph_result = tosa_valid; // result starts as valid |
| 215 | tosa_nesting_depth = 0; // if/while nesting level |
Dominic Symes | e4d6a1b | 2022-11-04 18:00:03 +0000 | [diff] [blame] | 216 | tosa_execute_graph(graph, input_list, output_list_spec, level); |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 217 | if (tosa_graph_result == tosa_unpredictable) { |
| 218 | return true; // No requirement to match an unpredictable result |
| 219 | } |
| 220 | result_test = execute_implementation_under_test(graph, input_list, output_list_test); |
| 221 | if (tosa_graph_result == tosa_error) { |
| 222 | return result_test == tosa_error; // result must be an error |
| 223 | } |
| 224 | if (exact_tensor_match(output_list_spec, output_list_test)) { |
| 225 | // Predictable bit-exact value match required |
| 226 | return true; |
| 227 | } |
| 228 | return false; |
| 229 | } |
| 230 | ---- |
| 231 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 232 | ==== Main Inference Profile Compliance |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 233 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 234 | A Main Inference compliant implementation must satisfy the following: |
| 235 | |
| 236 | * The implementation must meet <<Base Inference Profile Compliance>> for all Base inference complaint graphs |
| 237 | * The implementation must support all Main Inference operations using the datatype fp32_t |
| 238 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 239 | * The implementation must support all Main Inference operations using the datatype fp16_t |
| 240 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 241 | ** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype |
| 242 | * The implementation must support all Main Inference operations using the datatype bf16_t |
| 243 | ** The operations must meet the precision requirements of <<Main Inference precision requirements>> |
| 244 | ** Note: These requirements allow bf16_t operations to be implemented using the fp32_t datatype |
| 245 | |
| 246 | As with <<Base Inference Profile Compliance>> the pseudo-code function tosa_execute_graph() can return one of three possible results. |
| 247 | A compliant implementation must satisfy the following: |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 248 | |
| 249 | * For a graph returning tosa_error the implementation must also return an error |
| 250 | * For a graph returning tosa_valid the implementation must execute the entire graph without error |
| 251 | * For a graph returning tosa_valid and consisting only of integer operators the results must match exactly |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 252 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 253 | ===== Main Inference precision requirements |
| 254 | |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 255 | In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following. |
| 256 | In the table _ulp_ means unit of the last place. |
| 257 | The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 258 | |
| 259 | NOTE: The error criteria in this section are at an early draft stage and are likely to change during conformance test development. |
| 260 | |
| 261 | The following criteria apply to all operations: |
| 262 | |
| 263 | * If any input is a NaN and the result is floating-point then the result must be a NaN |
| 264 | * If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false |
| 265 | * if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable |
| 266 | |
| 267 | [cols="1,3"] |
| 268 | |=== |
| 269 | | Operation | Accuracy bound |
| 270 | |
Eric Kunze | 0ae7fd6 | 2023-09-26 17:29:43 -0700 | [diff] [blame] | 271 | | <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, <<SELECT>>, <<REDUCE_MAX>>, <<REDUCE_MIN>>, <<CONST>>, <<IDENTITY>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 272 | | Non NaN results must be exact. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 273 | |
| 274 | | <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>> |
| 275 | | The result must be exact with: + |
| 276 | (1) The sign of the zero is ignored + |
| 277 | (2) Infinities of the same sign compare as equal |
| 278 | |
| 279 | | <<CONV2D>>, <<CONV3D>>, <<DEPTHWISE_CONV2D>>, <<FULLY_CONNECTED>>, <<MATMUL>>, <<TRANSPOSE_CONV2D>> |
| 280 | | Each output can be expressed as a dot product of two input vectors. + |
| 281 | The dot product must meet the <<Dot product accuracy requirements>> |
| 282 | |
| 283 | | <<FFT2D>>, <<RFFT2D>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 284 | | Each output can be expressed as a dot product of an input vector with a constant coefficient vector. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 285 | The dot product must meet the <<Dot product accuracy requirements>> |
| 286 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 287 | | <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>> |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 288 | | Floating-point result overflows must be set to infinity of the correct sign. + |
| 289 | Floating-point result underflows must be set to zero of the correct sign. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 290 | Addition of infinites of different signs must produce a NaN. + |
| 291 | Subtraction of infinities of the same sign must produce a NaN. + |
| 292 | Multiplication of an infinity by a zero must produce a NaN. + |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 293 | Otherwise the result must be within 0.5 ulp of the mathematical result. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 294 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 295 | | <<CAST>> |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 296 | | Result overflows when converting between fp32_t, bf16_t and fp16_t must be set to infinity of the correct sign. + |
Eric Kunze | aa162aa | 2024-04-12 16:19:55 -0700 | [diff] [blame] | 297 | fp8e4m3_t and fp8e5m2_t must use the non-saturating mode defined in <<OCP-OFP8,OCP-OFP8>> when converting from the wider floating-point types. + |
| 298 | If saturation of the fp8 types is desired, a <<CLAMP>> operation with the appropriate parameters should be used before the cast. + |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 299 | Floating-point result underflows must be set to zero of the correct sign. + |
| 300 | Cast from floating-point to integer result overflows must be saturated. + |
Dominic Symes | c237b7e | 2023-09-20 15:08:53 +0100 | [diff] [blame] | 301 | Cast from floating-point to integer must be rounded using round to nearest, ties to even, rounding mode. + |
| 302 | Otherwise cast to floating-point must be within 0.5 ulp of the mathematical result. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 303 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 304 | | <<RECIPROCAL>> |
| 305 | | If the input is a zero or the result overlows the output must be an infinity of the same sign. + |
| 306 | If the input is an infinty or the result underflows the output must be a zero of the same sign. + |
| 307 | Otherwise:the result must be within 1 ulp of the mathematical result. |
| 308 | |
| 309 | | <<RSQRT>> |
| 310 | | If the input is less than zero the result must be a NaN. + |
| 311 | Otherwise if the input is a zero the output must be an infinity of the same sign. + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 312 | Otherwise the result must be within 2 ulp of the mathematical result. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 313 | |
Dominic Symes | 2bc6c57 | 2023-11-30 10:56:33 +0000 | [diff] [blame] | 314 | | <<LOG>>, <<ERF>> |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 315 | | If the input to LOG is less than zero then the result must be a NaN. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 316 | If the result overflows the output must be an infinity of the correct sign. + |
| 317 | If the result underflows the output must be a zero of the correct sign. + |
| 318 | Otherwise the result must be within 5 ulp of the mathematical result. |
| 319 | |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 320 | | <<EXP>> |
| 321 | | Let `x` be an input element and `out_imp` the implementation output of `exp(x)`. + |
| 322 | Let `out_ref` be the result of the fp64_t reference implementation of `exp(x)`. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 323 | Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, (1+abs(x)), 0, 1)` + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 324 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 325 | |
| 326 | | <<POW>> |
Eric Kunze | 18acfe3 | 2024-01-03 10:55:00 -0800 | [diff] [blame] | 327 | | Let `x`, `y` be input elements from `input1` and `input2` respectively. + |
| 328 | Let `out_imp` be the implementation output of `pow(x,y)`. + |
| 329 | If `x` is less than zero and `y` is non-integral then the result must be a NaN. + |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 330 | Let `out_ref` be the result of the fp64_t reference implementation of `pow(x,y)`. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 331 | Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(log(abs(x))*y)), 0, 1)` + |
Dominic Symes | a46cf1d | 2023-11-07 11:46:16 +0000 | [diff] [blame] | 332 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 333 | |
Dominic Symes | 8754ec2 | 2023-12-08 17:45:31 +0000 | [diff] [blame] | 334 | | <<SIGMOID>> |
Dominic Symes | 2bc6c57 | 2023-11-30 10:56:33 +0000 | [diff] [blame] | 335 | | Let `x` be an input element and `out_imp` the implementation output. + |
| 336 | Let `out_ref` be the result of the fp64_t reference implementation. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 337 | Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(x)), 0, 1)` + |
Dominic Symes | 2bc6c57 | 2023-11-30 10:56:33 +0000 | [diff] [blame] | 338 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 339 | |
Dominic Symes | 8754ec2 | 2023-12-08 17:45:31 +0000 | [diff] [blame] | 340 | | <<TANH>> |
| 341 | | Let `x` be an input element and `out_imp` the implementation output. + |
| 342 | Let `out_ref` be the result of the fp64_t reference implementation. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 343 | Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 4 * (1+abs(x)), 0.5, 1)` + |
Dominic Symes | 8754ec2 | 2023-12-08 17:45:31 +0000 | [diff] [blame] | 344 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 345 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 346 | | <<REDUCE_SUM>> |
| 347 | | Each output can be expressed as a dot product of an input vector with a vector of ones. + |
| 348 | This dot product must meet the <<Dot product accuracy requirements>> |
| 349 | |
| 350 | | <<AVG_POOL2D>> |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 351 | | Each output can be expressed as a dot product of an input vector with a vector with elements 1/KS where KS is the kernel size. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 352 | This dot product must meet the <<Dot product accuracy requirements>> |
| 353 | |
| 354 | | <<REDUCE_PRODUCT>> |
| 355 | | Result overflows must be set to an infinity of the correct sign. + |
| 356 | Result underflows must be set to a zero of the correct sign. + |
Dominic Symes | 83e79b5 | 2024-01-08 10:45:47 +0000 | [diff] [blame] | 357 | Let n be number of elements in the product, out_imp the implementation result, and out_ref the result of the fp64_t reference implementation. + |
| 358 | Let `err_bnd = abs(out_ref) * (pow(1 + pow(2, -normal_frac<in_out_t> - 1), n) - 1)` + |
| 359 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 360 | |
Eric Kunze | 1f05883 | 2024-02-13 16:51:17 -0800 | [diff] [blame] | 361 | | <<COS>> |
| 362 | | Let `x` be an input element and `out_imp` the implementation output of `cos(x)`. + |
| 363 | Let `out_ref` be the result of the fp64_t reference implementation of `cos(x)`. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 364 | Let `err_bnd = calcAbsErrorBound<in_out_t>(x, 1+abs(x), 0, 2)` + |
Eric Kunze | 1f05883 | 2024-02-13 16:51:17 -0800 | [diff] [blame] | 365 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 366 | |
| 367 | | <<SIN>> |
| 368 | | Let `x` be an input element and `out_imp` the implementation output of `sin(x)`. + |
| 369 | Let `out_ref` be the result of the fp64_t reference implementation of `sin(x)`. + |
Eric Kunze | 0e121c0 | 2024-04-10 15:26:55 -0700 | [diff] [blame] | 370 | Let `err_bnd = calcAbsErrorBound<in_out_t>(x, abs(x), 0, 2)` + |
Eric Kunze | 1f05883 | 2024-02-13 16:51:17 -0800 | [diff] [blame] | 371 | Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true |
| 372 | |
Eric Kunze | 2266c7a | 2023-10-27 14:38:56 -0700 | [diff] [blame] | 373 | | <<RESIZE>> |
| 374 | | The result corresponds to a sequence of floating-point calculations. + |
| 375 | The allowable error bound for the result of a resize is based on the maximum value of an element in the input tensor. + |
| 376 | Let `out_imp` be the implementation output. + |
| 377 | Let `out_ref` be the result of the fp64_t reference implementation. + |
| 378 | Let `err_bnd = max(abs(input)) * 0.006`. + |
| 379 | Then `tosa_reference_check_fp_bnd<out_t>(out_imp, out_ref, err_bnd)` must be true. |
| 380 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 381 | |=== |
| 382 | |
Dominic Symes | f791b44 | 2023-10-30 14:26:11 +0000 | [diff] [blame] | 383 | ===== Operator sequence precision requirement |
| 384 | |
| 385 | Precision criteria are specified for a single operator. |
| 386 | |
| 387 | An implementation M of a sequence of n TOSA operators, A[0] to A[n-1] is said to |
| 388 | be compliant if M gives the same result as a sequence of implementations |
| 389 | M[0] to M[n-1] such that: |
| 390 | |
| 391 | * Each M[k] implements A[k] with same or higher precision datatypes |
| 392 | * Each M[k] meets the accuracy defined in this specification for A[k] where the M[k] output is converted to A[k] output precision using round to nearest |
| 393 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 394 | ===== Dot product accuracy requirements |
| 395 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 396 | This section assumes an operation acting on tensors named 'input', 'weight' and optionally 'bias'. |
| 397 | Each output tensor element can be expressed as a dot product of elements between the 'input' and 'weight' tensors with optional bias addition. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 398 | The dot product has length KS, the kernel size. |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 399 | If the operation does not specify a bias then 'bias' is taken to be zero in this section. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 400 | Note: KS is defined for each relevant operator in the appendix section <<Main Inference operator test data>>. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 401 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 402 | In other words, each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`: |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 403 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 404 | `out = in[0] * w[0] + in[1] * w[1] + ... + in[KS-1] * w[KS-1] + b` |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 405 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 406 | The positions of `in[k]`, `w[k]`, `b` in the input, weight and bias tensors depends on the operation being performed. |
| 407 | This may be, for example, a convolution. |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 408 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 409 | This section defines the accuracy required for these operations. |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 410 | In this section: |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 411 | |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 412 | * "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<IEEE-754,IEEE-754>> |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 413 | * `operation_fp64()` is an fp64 reference implementation of the operation |
| 414 | * `operation_imp()` is the implementation under test |
| 415 | * `local_bound` is defined as follows: |
| 416 | ** For operations with a local_bound attribute it is the value of the optional attribute, with default value of false |
| 417 | ** For operations that do not have a local_bound attribute the value is true |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 418 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 419 | The checks described in the following code must pass for the following data sets: |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 420 | |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 421 | * Data sets defined for the operation in Appendix A <<Main Inference operator test data>>. |
| 422 | * Data sets that have at least MIN_DOT_PRODUCT different output values. For these data sets we take S=-1. |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 423 | |
| 424 | [source,c++] |
| 425 | ---- |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 426 | output_ref = operation_fp64(input, weight, bias); |
| 427 | output_imp = operation_imp (input, weight, bias); |
| 428 | input_abs = abs(input); // Element-wise absolute |
| 429 | weight_abs = abs(weight); // Element-wise absolute |
| 430 | bias_abs = abs(bias); // Element-wise absolute |
| 431 | if (!local_bound) { |
| 432 | input_abs_max = max_value(input_abs); // maximum over all elements |
| 433 | for_each(index in shape(input_abs) { |
| 434 | input_abs[index] = input_abs_max; // set all entries to global maximum |
| 435 | } |
| 436 | } |
| 437 | output_bnd = operation_fp64(input_abs, weight_abs, bias_abs); |
| 438 | |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 439 | size_t T = tensor_size(output_shape) // number dot product results |
Eric Kunze | 0afe61f | 2024-02-14 16:33:31 -0800 | [diff] [blame] | 440 | size ksb = ceil(KS / exp2(normal_frac<acc_t>() - normal_frac<out_t>())) + ((max_value(bias_abs) > 0) ? 1 : 0); |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 441 | fp64_t out_err_sum = 0.0; |
| 442 | fp64_t out_err_sumsq = 0.0; |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 443 | for_each(index in output_shape) { |
| 444 | fp64_t out_bnd = tensor_read<fp64_t>(output_bnd, output_shape, index); |
| 445 | fp64_t out_ref = tensor_read<fp64_t>(output_ref, output_shape, index); |
| 446 | acc_t out_imp = tensor_read<acc_t> (output_imp, output_shape, index); |
| 447 | fp64_t out_err; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 448 | if ((acc_t)out_bnd == infinity) { |
| 449 | // dot product can overflow and there is no accuracy limit |
| 450 | out_err = 0.0; |
| 451 | } else if (out_bnd == 0.0) { |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 452 | REQUIRE(out_ref == 0.0 && out_imp == 0.0); |
| 453 | out_err = 0.0; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 454 | } else { // 0.0 < out_bnd < infinity |
Eric Kunze | 0afe61f | 2024-02-14 16:33:31 -0800 | [diff] [blame] | 455 | fp64_t out_err_bnd = max(out_bnd * exp2(-1-normal_frac<out_t>()), normal_min<out_t>()); |
Dominic Symes | b203512 | 2023-09-01 11:41:08 +0100 | [diff] [blame] | 456 | out_err = (static_cast<fp64_t>(out_imp) - out_ref) / out_err_bnd; |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 457 | REQUIRE(abs(out_err) <= ksb); |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 458 | } |
| 459 | out_err_sum += out_err; |
| 460 | out_err_sumsq += out_err * out_err; |
| 461 | } |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 462 | if (input and weights are data set S with 3 <= S <= 5) { |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 463 | // check output error bias magnitude for data sets S which are not positive biased |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 464 | REQUIRE(abs(out_err_sum) <= 2*sqrt(ksb*T)); |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 465 | } |
| 466 | // check output error variance magnitude |
Dominic Symes | b5b0678 | 2023-07-27 11:50:57 +0100 | [diff] [blame] | 467 | REQUIRE(out_err_sumsq <= 0.4*ksb*T) |
Dominic Symes | 5b936a3 | 2023-03-01 11:34:40 +0000 | [diff] [blame] | 468 | ---- |
Dominic Symes | ca2a854 | 2021-03-19 13:56:27 +0000 | [diff] [blame] | 469 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 470 | === Tensor Definitions |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 471 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 472 | ==== Tensors |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 473 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 474 | Tensors are multidimensional arrays of data. |
| 475 | Tensors have metadata associated with them that describe characteristics of the tensor, including: |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 476 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 477 | * Data Type |
| 478 | * Shape |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 479 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 480 | The number of dimensions in a shape is called the rank. |
| 481 | A tensor with rank equal to zero is permitted. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 482 | In that case, the tensor has a single entry and is also known as a scalar. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 483 | A tensor shape is an array of integers of size equal to the rank of the tensor. |
| 484 | Each element in the tensor shape describes the number of elements in the dimension. |
| 485 | The tensor shape in each dimension must be greater than or equal to 1. |
| 486 | For tensor access information, see <<Tensor Access Helpers>>. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 487 | |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame] | 488 | The shape of a tensor of non-zero rank is a special type shape_t. |
| 489 | shape_t is a one-dimensional list with the size equal to the rank of the original tensor. |
| 490 | The components of a shape_t are of type size_t. |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 491 | |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 492 | In this version of the specification, shape_t values must be resolvable to constants at backend compile time. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 493 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 494 | ==== Tensor size limit |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 495 | |
Dominic Symes | bc72ba8 | 2023-04-24 17:08:02 +0100 | [diff] [blame] | 496 | The tensor overall size is limited by the data type size_t. |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame] | 497 | This type must be able to hold integers in the range 0 to (1 << (MAX_LOG2_SIZE + 1)) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>. |
| 498 | For each tensor, the number of tensor elements multiplied by the element size in bytes (which is taken to be 1 for elements smaller than a 8-bit) must be less than or equal to (1 << (MAX_LOG2_SIZE + 1)) - 1. |
Dominic Symes | bc72ba8 | 2023-04-24 17:08:02 +0100 | [diff] [blame] | 499 | |
Eric Kunze | 526f6c7 | 2024-01-12 17:18:42 -0800 | [diff] [blame] | 500 | The size of tensors along each of their dimensions is limited by the data type size_t. |
| 501 | |
| 502 | This means that the maximum size of a tensor along each dimension is (1 << MAX_LOG2_SIZE) - 1 and therefore the maximum coordinate value is (1 << MAX_LOG2_SIZE) - 2. |
Dominic Symes | 0205d99 | 2022-10-07 15:03:01 +0100 | [diff] [blame] | 503 | Indices used to access tensors must be non-negative. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 504 | |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 505 | |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 506 | ==== Data Layouts |
| 507 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 508 | The following data layouts are supported in TOSA. |
| 509 | TOSA operations are defined in terms of a linear packed tensor layout. |
| 510 | In a linear packed layout a rank r tensor has elements of dimension (r-1) consecutive. |
| 511 | The next to increment is dimension (r-2) and so on. |
| 512 | For a specification of this layout see the tensor read and write functions in section <<Tensor Access Helpers>>. |
| 513 | |
| 514 | An implementation of TOSA can choose a different tensor memory layout provided that the operation behavior is maintained. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 515 | |
| 516 | .Data Layouts |
| 517 | [cols="1,4,4"] |
| 518 | |=== |
| 519 | |Name|Description of dimensions|Usage |
| 520 | |
| 521 | |NHWC|Batch, Height, Width, Channels|Feature maps |
| 522 | |NDHWC|Batch, Depth, Height, Width, Channels|Feature maps for 3D convolution |
| 523 | |OHWI|Output channels, Filter Height, Filter Width, Input channels|Weights |
| 524 | |HWIM|Filter Height, Filter Width, Input channels, Channel Multiplier|Weights for depthwise convolutions |
| 525 | |DOHWI|Depth, Output Channels, Filter Height, Filter Width, Input Channels|Weights for 3D convolution |
| 526 | |=== |
| 527 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 528 | ==== Broadcasting |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 529 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 530 | In operations where broadcasting is supported, an input shape dimension can be broadcast to an output shape dimension if the input shape dimension is 1. |
| 531 | TOSA broadcast requires the rank of both tensors to be the same. |
| 532 | A RESHAPE can be done to create a compatible tensor with appropriate dimensions of size 1. |
| 533 | To map indexes in an output tensor to that of an input tensor, see <<Broadcast Helper>>. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 534 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 535 | ==== Supported Number Formats |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 536 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 537 | The following number formats are defined in TOSA. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 538 | The number formats supported by a given operator are listed in its table of supported types. |
Eric Kunze | 6dd3410 | 2024-02-25 22:24:52 -0800 | [diff] [blame] | 539 | A TOSA implementation must support the number formats listed in the supported data types for operators contained in that profile. |
| 540 | Number formats not required for any operators in a profile do not need to be implemented. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 541 | |
| 542 | .Number formats |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 543 | [cols="1,1,1,5"] |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 544 | |=== |
| 545 | |Format|Minimum|Maximum|Description |
| 546 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 547 | |bool_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 548 | | - |
| 549 | | - |
Kevin Petit | f9fcb61 | 2024-01-23 19:09:29 +0000 | [diff] [blame] | 550 | |Boolean value that is either `true` or `false`. Size implementation defined. The TOSA reference model implements this as int8_t with 0 for `false` and 1 for `true`. All non-zero values are accepted on input as `true`. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 551 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 552 | |i4_t |
| 553 | | - |
| 554 | | - |
| 555 | |Signless 4-bit integer type. Will be interpreted as int4_t by all operators |
| 556 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 557 | |int4_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 558 | | -7 |
| 559 | | +7 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 560 | |Signed 4-bit two's-complement value. Excludes -8 to maintain a symmetric about zero range for weights. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 561 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 562 | |i8_t |
| 563 | | - |
| 564 | | - |
| 565 | |Signless 8-bit integer value. Will be interpreted as int8_t unless otherwise specified by an operator. |
| 566 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 567 | |int8_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 568 | | -128 |
| 569 | | +127 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 570 | |Signed 8-bit two's-complement value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 571 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 572 | |uint8_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 573 | | 0 |
| 574 | | 255 |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 575 | |Unsigned 8-bit integer value. |
| 576 | |
| 577 | |i16_t |
| 578 | | - |
| 579 | | - |
| 580 | |Signless 16-bit integer type. Will be interpreted as int16_t unless otherwise specified by an operator. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 581 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 582 | |int16_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 583 | | -32768 |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 584 | | +32767 |
Eric Kunze | eef012e | 2022-05-13 14:54:06 -0700 | [diff] [blame] | 585 | |Signed 16-bit two's-complement value. |
| 586 | |
| 587 | |uint16_t |
| 588 | | 0 |
| 589 | | 65535 |
| 590 | |Unsigned 16-bit value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 591 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 592 | |i32_t |
| 593 | | - |
| 594 | | - |
| 595 | |Signless 32-bit integer value. Will be interpreted as int32_t by all operators. |
| 596 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 597 | |int32_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 598 | | -(1<<31) |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 599 | | (1<<31)-1 |
Eric Kunze | 173fc16 | 2021-08-17 14:57:46 -0700 | [diff] [blame] | 600 | |Signed 32-bit two's-complement value. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 601 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 602 | |i48_t |
| 603 | | - |
| 604 | | - |
Eric Kunze | 2f3f4a2 | 2024-01-08 14:22:11 -0800 | [diff] [blame] | 605 | |Signless 48-bit integer value. Will be interpreted as int48_t by all operators. |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 606 | |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 607 | |int48_t |
Eric Kunze | 57e79c0 | 2020-11-03 11:23:09 -0800 | [diff] [blame] | 608 | | -(1<<47) |
Eric Kunze | 2dce0d0 | 2021-01-12 16:19:50 -0800 | [diff] [blame] | 609 | | (1<<47)-1 |
Eric Kunze | 173fc16 | 2021-08-17 14:57:46 -0700 | [diff] [blame] | 610 | |Signed 48-bit two's-complement value. |
Eric Kunze | 57e79c0 | 2020-11-03 11:23:09 -0800 | [diff] [blame] | 611 | |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 612 | |fp8e4m3_t |
| 613 | | -448 |
| 614 | | 448 |
| 615 | | 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with four bits of exponent and three bits of mantissa. + |
| 616 | Normal values must be supported. + |
| 617 | Denormal values must be supported. + |
| 618 | The NaN encoding must be supported. + |
| 619 | Signed zero must be supported. |
| 620 | |
| 621 | |fp8e5m2_t |
| 622 | | -infinity |
| 623 | | +infinity |
| 624 | | 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with five bits of exponent and two bits of mantissa. + |
| 625 | Normal values must be supported. + |
| 626 | Denormal values must be supported. + |
| 627 | Positive and negative infinity must be supported. + |
| 628 | NaN encodings must be supported. + |
| 629 | Signed zero must be supported. |
| 630 | |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 631 | |fp16_t |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 632 | | -infinity |
| 633 | | +infinity |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 634 | | 16-bit half-precision floating-point defined by <<IEEE-754,IEEE-754>> . + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 635 | Normal values must be supported. + |
| 636 | Denormal values must either be supported or flushed to zero. + |
| 637 | Positive and negative infinity must be supported. + |
| 638 | At least one NaN encoding must be supported. + |
| 639 | Signed zero must be supported. |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 640 | |
| 641 | |bf16_t |
| 642 | | -infinity |
| 643 | | +infinity |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 644 | | 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. + |
| 645 | Normal values must be supported. + |
| 646 | Denormal values must either be supported or flushed to zero. + |
| 647 | Positive and negative infinity must be supported. + |
| 648 | At least one NaN encoding must be supported. + |
| 649 | Signed zero must be supported. |
Eric Kunze | 42229d0 | 2022-04-07 16:54:46 -0700 | [diff] [blame] | 650 | |
| 651 | |fp32_t |
| 652 | | -infinity |
| 653 | | +infinity |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 654 | | 32-bit single-precision floating-point defined by <<IEEE-754,IEEE-754>> . + |
Eric Kunze | 277a4f1 | 2023-05-12 17:50:19 -0700 | [diff] [blame] | 655 | Normal values must be supported. + |
| 656 | Denormal values must either be supported or flushed to zero. + |
| 657 | Positive and negative infinity must be supported. + |
| 658 | At least one NaN encoding must be supported. + |
| 659 | Signed zero must be supported. |
| 660 | |
| 661 | |fp64_t |
| 662 | | -infinity |
| 663 | | + infinity |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 664 | | 64-bit double-precision floating-point defined by <<IEEE-754,IEEE-754>>. + |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 665 | Normal values must be supported. + |
| 666 | Denormal values must either be supported or flushed to zero. + |
| 667 | Positive and negative infinity must be supported. + |
| 668 | At least one NaN encoding must be supported. + |
| 669 | Signed zero must be supported. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 670 | |=== |
| 671 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 672 | Note: In this specification minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point). |
| 673 | The minimum and maximum values for each type is given in the preceeding table. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 674 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 675 | Note: Integer number formats smaller than 8 bits may be used provided that the numerical result is the same as using a sequence of 8-bit TOSA operations. |
| 676 | For example, a convolution with low precision data must equal that of running the convolution at 8 bits and then clipping the result to the peritted output range. |
| 677 | This ensures that a Base Inference profile TOSA implementation can calculate the same result. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 678 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 679 | === Integer Behavior |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 680 | |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 681 | TOSA integer inputs and outputs are specified by signless values with the given number of bits. |
| 682 | Unless otherwise specified, these values will be interpreted as signed twos-complement. |
| 683 | The pseudocode will use int*_t to indicate use as a signed value and uint*_t to indicate use as an unsigned value. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 684 | If overflow occurs doing integer calculation, the result is unpredictable, as indicated by the REQUIRE checks in the pseudocode for the operators. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 685 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 686 | Unsigned 8 and 16-bit values are only allowed in the RESCALE operation, to allow for compatibility with networks which expect unsigned 8-bit or 16-bit tensors for input and output. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 687 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 688 | ==== Quantization |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 689 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 690 | Machine Learning frameworks may represent tensors with a quantized implementation, using integer values to represent the original floating-point numbers. |
| 691 | TOSA integer operations do not perform any implicit scaling to represent quantized values. |
| 692 | Required zero point values are passed to the operator as necessary, and will be processed according to the pseudocode for each operator. |
Eric Kunze | c949f8a | 2021-09-16 14:51:26 -0700 | [diff] [blame] | 693 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 694 | To convert a network containing quantized tensors to TOSA, generate explicit RESCALE operators for any change of quantization scaling. |
| 695 | This reduces quantized operations to purely integer operations. |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 696 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 697 | As an example, an ADD between two quantized tensors requires the integer values represent the same range. |
Kevin Petit | 5333c25 | 2023-05-16 09:08:48 +0100 | [diff] [blame] | 698 | The scale arguments for RESCALE can be calculated to ensure that the resulting tensors represent the same range. |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 699 | Then the ADD is performed, and a RESCALE can be used to ensure that the result is scaled properly. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 700 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 701 | RESCALE provides support for per-tensor and per-channel scaling values to ensure compatibility with a range of possible quantization implementations. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 702 | |
Eric Kunze | c949f8a | 2021-09-16 14:51:26 -0700 | [diff] [blame] | 703 | |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 704 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 705 | ==== Precision scaling |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 706 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 707 | TOSA uses the RESCALE operation to scale between values with differing precision. |
| 708 | The RESCALE operator is defined using an integer multiply, add, and shift. |
| 709 | This guarantees that all TOSA implementations will return the same result for a RESCALE, including those with no support for floating-point numbers. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 710 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 711 | This TOSA specification supports two precisions of multiplier: 16-bit and 32-bit. |
| 712 | The 32-bit multiplier version supports two rounding modes to enable simpler lowering of existing frameworks that use two stage rounding. |
| 713 | All arithmetic is designed so that it does not overflow a 64-bit accumulator and that the final result fits in 32 bits. |
| 714 | In particular a 48-bit value can only be scaled with the 16-bit multiplier. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 715 | |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 716 | The apply_scale functions provide a scaling of approximately (multiplier * 2^-shift^). |
| 717 | The shift and value range is limited to allow a variety of implementations. |
| 718 | The limit of 62 on shift allows the shift to be decomposed as two right shifts of 31. |
Eric Kunze | ce6e02c | 2022-03-11 15:12:38 -0800 | [diff] [blame] | 719 | The limit on value allows implementations that left shift the value before the multiply in the case of shifts of 32 or less. |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 720 | For example, in the case shift=30 an implementation of the form ((value\<<2) * multiplier + round)>>32 can be used. |
| 721 | A scaling range of 2^+12^ down to 2^-32^ is supported for both functions with a normalized multiplier. |
| 722 | |
| 723 | For example, in typical usage a scaling of m*2^-n^ where m is a fraction in the |
| 724 | range 1.0 \<= m < 2.0 can be represented using multiplier=(1<<30)*m, shift=(30+n) for |
| 725 | apply_scale_32() and multiplier=(1<<14)*m, shift=(14+n) for apply_scale_16(). |
| 726 | The values to achieve a scaling of 1.0 are shift=30, multiplier=1<<30 for apply_scale_32 and shift=14, multiplier=1<<14 for apply_scale_16. |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 727 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 728 | [source,c++] |
| 729 | ---- |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 730 | int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t double_round=false) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 731 | REQUIRE(multiplier >= 0); |
| 732 | REQUIRE(2 <= shift && shift <= 62); |
Dominic Symes | 830b43b | 2023-05-09 10:14:49 +0100 | [diff] [blame] | 733 | REQUIRE(value >= (-1 << (shift - 1)) && value < (1 << (shift - 1))); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 734 | int64_t round = 1 << (shift - 1); |
| 735 | if (double_round) { |
| 736 | if (shift > 31 && value >= 0) round += 1<<30; |
| 737 | if (shift > 31 && value < 0) round -= 1<<30; |
| 738 | } |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 739 | int64_t result = static_cast<int64_t>(value) * multiplier + round; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 740 | result = result >> shift; |
Dominic Symes | 3cb7535 | 2022-01-24 11:18:05 +0000 | [diff] [blame] | 741 | // result will fit a 32-bit range due to the REQUIRE on value |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 742 | return static_cast<int32_t>(result); |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 743 | } |
| 744 | |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 745 | int32_t apply_scale_16(int48_t value, int16_t multipler, int8_t shift) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 746 | REQUIRE(multiplier >= 0); |
| 747 | REQUIRE(2 <= shift && shift <= 62); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 748 | int64_t round = (1 << (shift - 1)); |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 749 | int64_t result = static_cast<int64_t>(value) * multiplier + round; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 750 | result = result >> shift; |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 751 | REQUIRE(result >= minimum<int32_t> && result <= maximum<int32_t>); |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 752 | return static_cast<int32_t>(result); |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 753 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 754 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 755 | |
| 756 | In some functions, the multiplier and shift are combined into a scale_t structure: |
| 757 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 758 | [source,c++] |
| 759 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 760 | typedef struct { |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 761 | int32_t multiplier; |
Kevin Petit | 98b3e33 | 2023-05-16 09:13:50 +0100 | [diff] [blame] | 762 | int8_t shift; |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 763 | } scale_t; |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 764 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 765 | |
| 766 | In places where a divide is required, we also use the function below to calculate an appropriate scaling value. |
| 767 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 768 | [source,c++] |
| 769 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 770 | scale_t reciprocal_scale(uint32_t value) { |
Eric Kunze | a910153 | 2021-06-17 18:01:09 -0700 | [diff] [blame] | 771 | REQUIRE(value > 0); |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 772 | scale_t scale; |
Dominic Symes | cb6c6b3 | 2022-04-29 16:15:56 +0100 | [diff] [blame] | 773 | int32_t k = 32 - count_leading_zeros(value - 1); // (1 << k) / 2 < value <= (1 << k) |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 774 | int64_t numerator = ((1 << 30) + 1) << k; |
| 775 | scale.multiplier = numerator / value; // (1 << 30) <= multiplier < (1 << 31) |
| 776 | scale.shift = 30 + k; |
| 777 | return scale; |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 778 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 779 | ---- |
Eric Kunze | 3309a53 | 2020-10-01 18:50:46 -0700 | [diff] [blame] | 780 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 781 | ==== Integer Convolutions |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 782 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 783 | For the convolution operators, the input is not required to be scaled. |
| 784 | The integer versions of the convolution operators will subtract the zero point from the integer values as defined for each operator. |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 785 | The convolution produces an accumulator output of type int32_t or int48_t. |
| 786 | This accumulator output is then scaled to the final output range using the RESCALE operator. |
| 787 | The scale applied in the RESCALE operator should be set to multiplier and shift values such that: multiplier * 2^-shift^ = (input scale * weight scale) / output_scale. |
| 788 | Here, input_scale, weight_scale and output_scale are the conversion factors from integer to floating-point for the input, weight and output tensor values respectively. |
| 789 | If per-channel scaling is needed then the per-channel option of the RESCALE operation should be used. |
| 790 | |
Eric Kunze | f9e5ba9 | 2022-05-26 16:38:40 -0700 | [diff] [blame] | 791 | ==== Integer Elementwise Operators |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 792 | |
| 793 | When two quantized tensors are used in an operation, they must represent the same numeric range for the result to be valid. |
| 794 | In this case, TOSA expects that RESCALE operators will be used as necessary to generate 32-bit integer values in a common range. |
| 795 | There are many valid choices for scale factors and options for the common range. |
| 796 | TOSA does not impose a requirement on which scale factors and range should be used. |
| 797 | Compilers generating TOSA sequences should choose a range that allows the operation to be computed without overflow, while allowing the highest possible accuracy of the output. |
| 798 | |
| 799 | ==== General Unary Functions |
| 800 | General unary functions such as sigmoid(), tanh(), exp() for integer inputs are expressed using a lookup table and interpolation to enable efficient implementation. |
| 801 | This also allows for other operations with the addition of user-supplied tables (the TABLE operation). |
| 802 | All table lookups are based on the following reference lookup function that takes as input a table of 513 entries of 16 bits each. |
| 803 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 804 | [source,c++] |
| 805 | ---- |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 806 | int32_t apply_lookup_s(int16_t *table, int32_t value) |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 807 | { |
Eric Kunze | fb0284e | 2023-07-18 15:20:53 -0700 | [diff] [blame] | 808 | int16_t clipped_value = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767)); |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 809 | int32_t index = (clipped_value + 32768) >> 7; |
| 810 | int32_t fraction = clipped_value & 0x7f; |
| 811 | int16_t base = table[index]; |
| 812 | int16_t next = table[index+1]; |
Dominic Symes | 2ff79fe | 2022-01-27 15:44:26 +0000 | [diff] [blame] | 813 | int32_t slope = next - base; |
| 814 | REQUIRE(slope >= minimum<int16_t> && slope <= maximum<int16_t>) |
| 815 | int32_t return_value = (base << 7) + slope * fraction; |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 816 | return return_value; // return interpolated value of 16 + 7 = 23 bits |
| 817 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 818 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 819 | |
| 820 | Note that although the table lookup defined here has 16-bit precision, for 8-bit only operations an 8-bit table can be derived by applying the reference function to each of the possible 256 input values. |
| 821 | The following code constructs a 513-entry table based on a reference function. |
| 822 | |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 823 | [source,c++] |
| 824 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 825 | void generate_lookup_table(int16_t *table, int32_t (*reference)(int32_t)) |
| 826 | { |
| 827 | for (int i = -256; i <= 256; i++) { |
| 828 | int32_t value = (*reference)(i); |
Eric Kunze | 6085883 | 2024-01-22 16:54:29 -0800 | [diff] [blame] | 829 | table[i + 256] = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767)); |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 830 | } |
| 831 | } |
Eric Kunze | 839830a | 2021-03-11 15:38:22 -0800 | [diff] [blame] | 832 | ---- |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 833 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 834 | === Other publications |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 835 | |
Dominic Symes | c386a05 | 2023-01-20 16:09:31 +0000 | [diff] [blame] | 836 | The following publications are referred to in this specification, or provide more information: |
Eric Kunze | 1e9ba65 | 2021-02-17 19:23:39 -0800 | [diff] [blame] | 837 | |
Eric Kunze | 74e2ceb | 2023-10-20 15:58:55 -0700 | [diff] [blame] | 838 | . [[IEEE-754]]IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008. |
| 839 | . [[OCP-OFP8]]Open Compute Project OCP 8-bit Floating Point Specification (OFP8) Revision 1.0 |