blob: 95e405ca8eca8875e24f28f7bc2cb48d7c4de572 [file] [log] [blame]
Eric Kunze3309a532020-10-01 18:50:46 -07001//
2// This confidential and proprietary software may be used only as
3// authorised by a licensing agreement from ARM Limited
Eric Kunze18acfe32024-01-03 10:55:00 -08004// (C) COPYRIGHT 2020-2024 ARM Limited
Eric Kunze3309a532020-10-01 18:50:46 -07005// ALL RIGHTS RESERVED
6// The entire notice above must be reproduced on all authorised
7// copies and copies may only be made to the extent permitted
8// by a licensing agreement from ARM Limited.
9
10== Introduction
11
12=== Overview
13
Eric Kunzefa1b3242020-11-09 13:53:23 -080014Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor
15operations commonly employed by Deep Neural Networks. The intent is to enable a
16variety of implementations running on a diverse range of processors, with the
17results at the TOSA level consistent across those implementations. Applications
18or frameworks which target TOSA can therefore be deployed on a wide range of
19different processors, such as SIMD CPUs, GPUs and custom hardware such as
20NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators
21from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible
22in TOSA. It is expected that there will be tools to lower from ML frameworks
23into TOSA.
24
25=== Goals
26
27The goals of TOSA include the following:
28
29* A minimal and stable set of tensor-level operators to which machine learning
30framework operators can be reduced.
31
32* Full support for both quantized integer and floating-point content.
33
34* Precise functional description of the behavior of every operator, including
35the treatment of their numerical behavior in the case of precision, saturation,
36scaling, and range as required by quantized datatypes.
37
38* Agnostic to any single high-level framework, compiler backend stack or
39particular target.
40
41* The detailed functional and numerical description enables precise code
42construction for a diverse range of targets – SIMD CPUs, GPUs and custom
43hardware such as NPUs/TPUs.
44
45=== Specification
46
47The TOSA Specification is written as AsciiDoc mark-up and developed in its raw
48mark-up form, managed through a git repository here:
Eric Kunzef9e5ba92022-05-26 16:38:40 -070049https://git.mlplatform.org/tosa/specification.git/.
50The specification is developed and versioned much like software.
51While the mark-up is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML.
52To do this, please follow the instructions in the README.md in the root of the specification repository.
53
54=== Operator Selection Principles
55
56TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way.
57To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed.
58The following principles govern the selection of operators within TOSA.
59
60.Principles
61[cols="1,5,5"]
62|===
63|ID|Principle|Reason for this
64
65|P0
66|An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations.
67|If the operator can be broken down, then we should look at the component operators.
68
69|P1
70|An operator shall be a usable as a component out of which more complex operations can be constructed.
71|Single use operators have a high architectural cost and a more reusable version should be considered instead.
72
73|P2
74|Precision should be appropriate for the input and output data types.
75|Precision higher than that needed to calculate the result leads to extra implementation cost.
76
77|P3
78|Numerical definition of common sub-operations should be consistent between operators (for example: value scaling).
79|Consistent sub-operation definition reduces the operator implementation cost.
80
81|P4
Kevin Petit5333c252023-05-16 09:08:48 +010082|The valid input and output ranges for all arguments shall be specified.
Eric Kunzef9e5ba92022-05-26 16:38:40 -070083|Ranges are required to make consistent (numerically agreeing) implementations possible.
84
85|P5
86|Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets.
87|Reduces implementation cost and gives consistent inference results.
88|===
Eric Kunze3309a532020-10-01 18:50:46 -070089
Eric Kunze618f66a2024-04-16 17:54:34 -070090=== Versioning
91
92TOSA follows a semantic versioning policy with a major.minor.patch.draft scheme.
93See below for the TOSA definition of backward compatibility.
94
95* Major version changes may break backwards compatibility.
96* Minor numbers may add functionality in a backwards compatible way.
97* Patch versions are for bug fixes, clarifications, or trivial changes.
98* The draft flag notes whether the version referenced is finalized.
99
100Major, minor, and patch numbers are limited to eight bits.
101Draft is a single bit flag.
102If stored in a 32-bit value, the remaining bits are reserved for future use.
103
104==== Backwards Compatibility
105
106TOSA graphs created with previous minor versions within a major version must continue to work.
107The following portions of the specification and implementation will not change within a major version:
108
109* Operator Names
110* Arguments including ordering, input/attribute/output, name, rank
111* ERROR_IF statements
112* Functionality of the pseudocode for each operator
113* Level definitions and checks
114* Supported Data Type tables
115* Conformance test definitions
116* Enumerated types and values
117
118Changes to the following do not break compatibility:
119
120* Order of operations within the XML
121* Operator section names
122* Descriptive text that does not affect functionality
123* Non-functional changes to pseudocode (for example: cleanup, local variable name changes)
124
125Minor versions are allowed to add new operators or other functionality as long as the above guarantees hold.
126
127In addition, new extensions may be added to the specification between TOSA releases.
128They may not change anything that would break backward compatibility according to the above definitions.
129
Eric Kunze3309a532020-10-01 18:50:46 -0700130=== Profiles
131
Eric Kunze6dd34102024-02-25 22:24:52 -0800132TOSA profiles enable efficient implementation on different classes of device.
133Each profile is an independent set of operations and data type combinations.
134
135TOSA profile extensions define optional operation and data type combinations.
136
137Each operator's Supported Data Types table will define which profile or extension an operator and data type is in.
138An operator / data type combination may be part of multiple profiles or extensions.
139If so, each profile and extension will be listed in the Supported Data Types table.
140In addition, a table listing all operations for each profile can be found in Appendix B.
141
142The following are required for compliant TOSA implementations:
143
144* A TOSA implementation must implement at least one profile.
145* A TOSA implementation may choose to implement any extensions.
146* If a TOSA implementation chooses to implement an extension, it must implement the complete extension.
147* If a operator / data type combination requires multiple extensions, the combination is only required to be implemented if all extensions are implemented
148** For example, a CAST from bf16 to fp8 is only required if both extensions are implemented.
Eric Kunze3309a532020-10-01 18:50:46 -0700149
150.Profiles
Eric Kunze6dd34102024-02-25 22:24:52 -0800151include::{generated}/profiles.adoc[]
Eric Kunze3309a532020-10-01 18:50:46 -0700152
Eric Kunze6dd34102024-02-25 22:24:52 -0800153.Profile Extensions
154include::{generated}/profile_extensions.adoc[]
Eric Kunze3309a532020-10-01 18:50:46 -0700155
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000156=== Levels
157
Kevin Petit5333c252023-05-16 09:08:48 +0100158A TOSA level defines operator argument ranges that an implementation shall support.
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000159This is distinct from a profile that defines the operations and data-types supported.
Eric Kunze6dd34102024-02-25 22:24:52 -0800160One level must apply to all profiles and extensions supported by an implementation.
161
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000162This version of the specification defines two TOSA levels:
163
Kevin Petit5333c252023-05-16 09:08:48 +0100164* No level : allows the full range of arguments specified by the operations according to the operation data types.
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000165* Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K.
166
167Later versions of the specification may define additional levels.
Eric Kunze0d7d0012024-03-25 14:07:29 -0700168The following table defines the value ranges for each level.
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000169These ranges are checked using the LEVEL_CHECK() function with the operator descriptions.
170
171.Level maximums
Kevin Petit211c5f52023-04-26 16:25:52 +0100172include::{generated}/levels.adoc[]
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000173
Eric Kunze42229d02022-04-07 16:54:46 -0700174=== Status
175
176The TOSA specification is a work in progress.
177
178* The Base Inference profile should be considered to be near release quality, with conformance tests available.
179* The Main Inference profile has most of the expected operators in place, but is still subject to change.
180* The reference model and conformance tests do not yet support all of the floating point types that have been defined.
181* There is not currently a conformance test suite available for Main Inference.
Eric Kunze42229d02022-04-07 16:54:46 -0700182
Dominic Symesca2a8542021-03-19 13:56:27 +0000183=== Compliance
184
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000185This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level.
Dominic Symes5b936a32023-03-01 11:34:40 +0000186To be compliant an implementation must achieve the results and accuracy defined by this specification.
187TOSA also defines a set of conformance tests.
188A compliant implementation must pass the conformance tests.
189The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests.
Dominic Symesca2a8542021-03-19 13:56:27 +0000190
Dominic Symesc386a052023-01-20 16:09:31 +0000191==== Base Inference Profile Compliance
Dominic Symesca2a8542021-03-19 13:56:27 +0000192
Eric Kunzea3eded02021-12-13 15:40:04 -0800193The <<Operator Graphs>> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph.
194This behavior is captured in the pseudo-code function tosa_execute_graph().
Dominic Symesca2a8542021-03-19 13:56:27 +0000195For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph:
196
197* tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon.
198* tosa_error: The graph does not meet the specification and is recognised as an illegal graph.
199* tosa_valid: The result is defined and predictable and the list of output tensors defines the result.
200
201An implementation is compliant to the TOSA Baseline Inference Profile if it matches the above results as follows:
202
203* For tosa_unpredictable, the implementation can return whatever result it chooses (including error)
204* For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any)
205* For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification.
206
207In terms of psuedo-code, if *graph* is a TOSA graph consisting of Baseline Inference Profile operators and *input_list* is a list of input tensors then the following test must pass.
208
209[source,c++]
210----
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000211bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level_t level) {
Dominic Symesca2a8542021-03-19 13:56:27 +0000212 shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph));
213 shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph));
Dominic Symes7b0f1c92023-07-20 14:26:38 +0100214 tosa_graph_result = tosa_valid; // result starts as valid
215 tosa_nesting_depth = 0; // if/while nesting level
Dominic Symese4d6a1b2022-11-04 18:00:03 +0000216 tosa_execute_graph(graph, input_list, output_list_spec, level);
Dominic Symesca2a8542021-03-19 13:56:27 +0000217 if (tosa_graph_result == tosa_unpredictable) {
218 return true; // No requirement to match an unpredictable result
219 }
220 result_test = execute_implementation_under_test(graph, input_list, output_list_test);
221 if (tosa_graph_result == tosa_error) {
222 return result_test == tosa_error; // result must be an error
223 }
224 if (exact_tensor_match(output_list_spec, output_list_test)) {
225 // Predictable bit-exact value match required
226 return true;
227 }
228 return false;
229}
230----
231
Dominic Symes5b936a32023-03-01 11:34:40 +0000232==== Main Inference Profile Compliance
Dominic Symesca2a8542021-03-19 13:56:27 +0000233
Dominic Symesc386a052023-01-20 16:09:31 +0000234A Main Inference compliant implementation must satisfy the following:
235
236* The implementation must meet <<Base Inference Profile Compliance>> for all Base inference complaint graphs
237* The implementation must support all Main Inference operations using the datatype fp32_t
238** The operations must meet the precision requirements of <<Main Inference precision requirements>>
239* The implementation must support all Main Inference operations using the datatype fp16_t
240** The operations must meet the precision requirements of <<Main Inference precision requirements>>
241** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype
242* The implementation must support all Main Inference operations using the datatype bf16_t
243** The operations must meet the precision requirements of <<Main Inference precision requirements>>
244** Note: These requirements allow bf16_t operations to be implemented using the fp32_t datatype
245
246As with <<Base Inference Profile Compliance>> the pseudo-code function tosa_execute_graph() can return one of three possible results.
247A compliant implementation must satisfy the following:
Dominic Symesca2a8542021-03-19 13:56:27 +0000248
249* For a graph returning tosa_error the implementation must also return an error
250* For a graph returning tosa_valid the implementation must execute the entire graph without error
251* For a graph returning tosa_valid and consisting only of integer operators the results must match exactly
Dominic Symesca2a8542021-03-19 13:56:27 +0000252
Dominic Symesc386a052023-01-20 16:09:31 +0000253===== Main Inference precision requirements
254
Dominic Symesc237b7e2023-09-20 15:08:53 +0100255In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following.
256In the table _ulp_ means unit of the last place.
257The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification.
Dominic Symesc386a052023-01-20 16:09:31 +0000258
259NOTE: The error criteria in this section are at an early draft stage and are likely to change during conformance test development.
260
261The following criteria apply to all operations:
262
263* If any input is a NaN and the result is floating-point then the result must be a NaN
264* If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false
265* if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable
266
267[cols="1,3"]
268|===
269| Operation | Accuracy bound
270
Eric Kunze0ae7fd62023-09-26 17:29:43 -0700271| <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, <<SELECT>>, <<REDUCE_MAX>>, <<REDUCE_MIN>>, <<CONST>>, <<IDENTITY>>
Dominic Symes5b936a32023-03-01 11:34:40 +0000272| Non NaN results must be exact.
Dominic Symesc386a052023-01-20 16:09:31 +0000273
274| <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>>
275| The result must be exact with: +
276(1) The sign of the zero is ignored +
277(2) Infinities of the same sign compare as equal
278
279| <<CONV2D>>, <<CONV3D>>, <<DEPTHWISE_CONV2D>>, <<FULLY_CONNECTED>>, <<MATMUL>>, <<TRANSPOSE_CONV2D>>
280| Each output can be expressed as a dot product of two input vectors. +
281The dot product must meet the <<Dot product accuracy requirements>>
282
283| <<FFT2D>>, <<RFFT2D>>
Dominic Symes5b936a32023-03-01 11:34:40 +0000284| Each output can be expressed as a dot product of an input vector with a constant coefficient vector. +
Dominic Symesc386a052023-01-20 16:09:31 +0000285The dot product must meet the <<Dot product accuracy requirements>>
286
Dominic Symes5b936a32023-03-01 11:34:40 +0000287| <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>>
Dominic Symesc386a052023-01-20 16:09:31 +0000288| Floating-point result overflows must be set to infinity of the correct sign. +
289Floating-point result underflows must be set to zero of the correct sign. +
Dominic Symesc386a052023-01-20 16:09:31 +0000290Addition of infinites of different signs must produce a NaN. +
291Subtraction of infinities of the same sign must produce a NaN. +
292Multiplication of an infinity by a zero must produce a NaN. +
Dominic Symesc237b7e2023-09-20 15:08:53 +0100293Otherwise the result must be within 0.5 ulp of the mathematical result.
Dominic Symesc386a052023-01-20 16:09:31 +0000294
Dominic Symes5b936a32023-03-01 11:34:40 +0000295| <<CAST>>
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700296| Result overflows when converting between fp32_t, bf16_t and fp16_t must be set to infinity of the correct sign. +
Eric Kunzeaa162aa2024-04-12 16:19:55 -0700297fp8e4m3_t and fp8e5m2_t must use the non-saturating mode defined in <<OCP-OFP8,OCP-OFP8>> when converting from the wider floating-point types. +
298If saturation of the fp8 types is desired, a <<CLAMP>> operation with the appropriate parameters should be used before the cast. +
Dominic Symes5b936a32023-03-01 11:34:40 +0000299Floating-point result underflows must be set to zero of the correct sign. +
300Cast from floating-point to integer result overflows must be saturated. +
Dominic Symesc237b7e2023-09-20 15:08:53 +0100301Cast from floating-point to integer must be rounded using round to nearest, ties to even, rounding mode. +
302Otherwise cast to floating-point must be within 0.5 ulp of the mathematical result.
Dominic Symes5b936a32023-03-01 11:34:40 +0000303
Dominic Symesc386a052023-01-20 16:09:31 +0000304| <<RECIPROCAL>>
305| If the input is a zero or the result overlows the output must be an infinity of the same sign. +
306If the input is an infinty or the result underflows the output must be a zero of the same sign. +
307Otherwise:the result must be within 1 ulp of the mathematical result.
308
309| <<RSQRT>>
310| If the input is less than zero the result must be a NaN. +
311Otherwise if the input is a zero the output must be an infinity of the same sign. +
Dominic Symesa46cf1d2023-11-07 11:46:16 +0000312Otherwise the result must be within 2 ulp of the mathematical result.
Dominic Symesc386a052023-01-20 16:09:31 +0000313
Dominic Symes2bc6c572023-11-30 10:56:33 +0000314| <<LOG>>, <<ERF>>
Dominic Symesc386a052023-01-20 16:09:31 +0000315| If the input to LOG is less than zero then the result must be a NaN. +
Dominic Symesc386a052023-01-20 16:09:31 +0000316If the result overflows the output must be an infinity of the correct sign. +
317If the result underflows the output must be a zero of the correct sign. +
318Otherwise the result must be within 5 ulp of the mathematical result.
319
Dominic Symesf791b442023-10-30 14:26:11 +0000320| <<EXP>>
321| Let `x` be an input element and `out_imp` the implementation output of `exp(x)`. +
322Let `out_ref` be the result of the fp64_t reference implementation of `exp(x)`. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700323Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, (1+abs(x)), 0, 1)` +
Dominic Symesa46cf1d2023-11-07 11:46:16 +0000324Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symesf791b442023-10-30 14:26:11 +0000325
326| <<POW>>
Eric Kunze18acfe32024-01-03 10:55:00 -0800327| Let `x`, `y` be input elements from `input1` and `input2` respectively. +
328Let `out_imp` be the implementation output of `pow(x,y)`. +
329If `x` is less than zero and `y` is non-integral then the result must be a NaN. +
Dominic Symesf791b442023-10-30 14:26:11 +0000330Let `out_ref` be the result of the fp64_t reference implementation of `pow(x,y)`. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700331Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(log(abs(x))*y)), 0, 1)` +
Dominic Symesa46cf1d2023-11-07 11:46:16 +0000332Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symesf791b442023-10-30 14:26:11 +0000333
Dominic Symes8754ec22023-12-08 17:45:31 +0000334| <<SIGMOID>>
Dominic Symes2bc6c572023-11-30 10:56:33 +0000335| Let `x` be an input element and `out_imp` the implementation output. +
336Let `out_ref` be the result of the fp64_t reference implementation. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700337Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(x)), 0, 1)` +
Dominic Symes2bc6c572023-11-30 10:56:33 +0000338Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
339
Dominic Symes8754ec22023-12-08 17:45:31 +0000340| <<TANH>>
341| Let `x` be an input element and `out_imp` the implementation output. +
342Let `out_ref` be the result of the fp64_t reference implementation. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700343Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 4 * (1+abs(x)), 0.5, 1)` +
Dominic Symes8754ec22023-12-08 17:45:31 +0000344Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
345
Dominic Symesc386a052023-01-20 16:09:31 +0000346| <<REDUCE_SUM>>
347| Each output can be expressed as a dot product of an input vector with a vector of ones. +
348This dot product must meet the <<Dot product accuracy requirements>>
349
350| <<AVG_POOL2D>>
Dominic Symes5b936a32023-03-01 11:34:40 +0000351| Each output can be expressed as a dot product of an input vector with a vector with elements 1/KS where KS is the kernel size. +
Dominic Symesc386a052023-01-20 16:09:31 +0000352This dot product must meet the <<Dot product accuracy requirements>>
353
354| <<REDUCE_PRODUCT>>
355| Result overflows must be set to an infinity of the correct sign. +
356Result underflows must be set to a zero of the correct sign. +
Dominic Symes83e79b52024-01-08 10:45:47 +0000357Let n be number of elements in the product, out_imp the implementation result, and out_ref the result of the fp64_t reference implementation. +
358Let `err_bnd = abs(out_ref) * (pow(1 + pow(2, -normal_frac<in_out_t> - 1), n) - 1)` +
359Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symesc386a052023-01-20 16:09:31 +0000360
Eric Kunze1f058832024-02-13 16:51:17 -0800361| <<COS>>
362| Let `x` be an input element and `out_imp` the implementation output of `cos(x)`. +
363Let `out_ref` be the result of the fp64_t reference implementation of `cos(x)`. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700364Let `err_bnd = calcAbsErrorBound<in_out_t>(x, 1+abs(x), 0, 2)` +
Eric Kunze1f058832024-02-13 16:51:17 -0800365Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
366
367| <<SIN>>
368| Let `x` be an input element and `out_imp` the implementation output of `sin(x)`. +
369Let `out_ref` be the result of the fp64_t reference implementation of `sin(x)`. +
Eric Kunze0e121c02024-04-10 15:26:55 -0700370Let `err_bnd = calcAbsErrorBound<in_out_t>(x, abs(x), 0, 2)` +
Eric Kunze1f058832024-02-13 16:51:17 -0800371Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
372
Eric Kunze2266c7a2023-10-27 14:38:56 -0700373| <<RESIZE>>
374| The result corresponds to a sequence of floating-point calculations. +
375The allowable error bound for the result of a resize is based on the maximum value of an element in the input tensor. +
376Let `out_imp` be the implementation output. +
377Let `out_ref` be the result of the fp64_t reference implementation. +
378Let `err_bnd = max(abs(input)) * 0.006`. +
379Then `tosa_reference_check_fp_bnd<out_t>(out_imp, out_ref, err_bnd)` must be true.
380
Dominic Symesc386a052023-01-20 16:09:31 +0000381|===
382
Dominic Symesf791b442023-10-30 14:26:11 +0000383===== Operator sequence precision requirement
384
385Precision criteria are specified for a single operator.
386
387An implementation M of a sequence of n TOSA operators, A[0] to A[n-1] is said to
388be compliant if M gives the same result as a sequence of implementations
389M[0] to M[n-1] such that:
390
391* Each M[k] implements A[k] with same or higher precision datatypes
392* Each M[k] meets the accuracy defined in this specification for A[k] where the M[k] output is converted to A[k] output precision using round to nearest
393
Dominic Symesc386a052023-01-20 16:09:31 +0000394===== Dot product accuracy requirements
395
Dominic Symesb5b06782023-07-27 11:50:57 +0100396This section assumes an operation acting on tensors named 'input', 'weight' and optionally 'bias'.
397Each output tensor element can be expressed as a dot product of elements between the 'input' and 'weight' tensors with optional bias addition.
Dominic Symes5b936a32023-03-01 11:34:40 +0000398The dot product has length KS, the kernel size.
Dominic Symesb5b06782023-07-27 11:50:57 +0100399If the operation does not specify a bias then 'bias' is taken to be zero in this section.
Dominic Symes5b936a32023-03-01 11:34:40 +0000400Note: KS is defined for each relevant operator in the appendix section <<Main Inference operator test data>>.
Dominic Symesc386a052023-01-20 16:09:31 +0000401
Dominic Symesb5b06782023-07-27 11:50:57 +0100402In other words, each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`:
Dominic Symesc386a052023-01-20 16:09:31 +0000403
Dominic Symesb5b06782023-07-27 11:50:57 +0100404`out = in[0] * w[0] + in[1] * w[1] + ... + in[KS-1] * w[KS-1] + b`
Dominic Symesc386a052023-01-20 16:09:31 +0000405
Dominic Symesb5b06782023-07-27 11:50:57 +0100406The positions of `in[k]`, `w[k]`, `b` in the input, weight and bias tensors depends on the operation being performed.
407This may be, for example, a convolution.
Dominic Symesc386a052023-01-20 16:09:31 +0000408
Dominic Symes5b936a32023-03-01 11:34:40 +0000409This section defines the accuracy required for these operations.
Dominic Symesb5b06782023-07-27 11:50:57 +0100410In this section:
Dominic Symesc386a052023-01-20 16:09:31 +0000411
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700412* "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<IEEE-754,IEEE-754>>
Dominic Symesb5b06782023-07-27 11:50:57 +0100413* `operation_fp64()` is an fp64 reference implementation of the operation
414* `operation_imp()` is the implementation under test
415* `local_bound` is defined as follows:
416** For operations with a local_bound attribute it is the value of the optional attribute, with default value of false
417** For operations that do not have a local_bound attribute the value is true
Dominic Symes5b936a32023-03-01 11:34:40 +0000418
Dominic Symesb5b06782023-07-27 11:50:57 +0100419The checks described in the following code must pass for the following data sets:
Dominic Symes5b936a32023-03-01 11:34:40 +0000420
Dominic Symesb5b06782023-07-27 11:50:57 +0100421* Data sets defined for the operation in Appendix A <<Main Inference operator test data>>.
422* Data sets that have at least MIN_DOT_PRODUCT different output values. For these data sets we take S=-1.
Dominic Symes5b936a32023-03-01 11:34:40 +0000423
424[source,c++]
425----
Dominic Symesb5b06782023-07-27 11:50:57 +0100426output_ref = operation_fp64(input, weight, bias);
427output_imp = operation_imp (input, weight, bias);
428input_abs = abs(input); // Element-wise absolute
429weight_abs = abs(weight); // Element-wise absolute
430bias_abs = abs(bias); // Element-wise absolute
431if (!local_bound) {
432 input_abs_max = max_value(input_abs); // maximum over all elements
433 for_each(index in shape(input_abs) {
434 input_abs[index] = input_abs_max; // set all entries to global maximum
435 }
436}
437output_bnd = operation_fp64(input_abs, weight_abs, bias_abs);
438
Dominic Symes5b936a32023-03-01 11:34:40 +0000439size_t T = tensor_size(output_shape) // number dot product results
Eric Kunze0afe61f2024-02-14 16:33:31 -0800440size ksb = ceil(KS / exp2(normal_frac<acc_t>() - normal_frac<out_t>())) + ((max_value(bias_abs) > 0) ? 1 : 0);
Dominic Symes5b936a32023-03-01 11:34:40 +0000441fp64_t out_err_sum = 0.0;
442fp64_t out_err_sumsq = 0.0;
Dominic Symes5b936a32023-03-01 11:34:40 +0000443for_each(index in output_shape) {
444 fp64_t out_bnd = tensor_read<fp64_t>(output_bnd, output_shape, index);
445 fp64_t out_ref = tensor_read<fp64_t>(output_ref, output_shape, index);
446 acc_t out_imp = tensor_read<acc_t> (output_imp, output_shape, index);
447 fp64_t out_err;
Dominic Symesb5b06782023-07-27 11:50:57 +0100448 if ((acc_t)out_bnd == infinity) {
449 // dot product can overflow and there is no accuracy limit
450 out_err = 0.0;
451 } else if (out_bnd == 0.0) {
Dominic Symes5b936a32023-03-01 11:34:40 +0000452 REQUIRE(out_ref == 0.0 && out_imp == 0.0);
453 out_err = 0.0;
Dominic Symesb5b06782023-07-27 11:50:57 +0100454 } else { // 0.0 < out_bnd < infinity
Eric Kunze0afe61f2024-02-14 16:33:31 -0800455 fp64_t out_err_bnd = max(out_bnd * exp2(-1-normal_frac<out_t>()), normal_min<out_t>());
Dominic Symesb2035122023-09-01 11:41:08 +0100456 out_err = (static_cast<fp64_t>(out_imp) - out_ref) / out_err_bnd;
Dominic Symesb5b06782023-07-27 11:50:57 +0100457 REQUIRE(abs(out_err) <= ksb);
Dominic Symes5b936a32023-03-01 11:34:40 +0000458 }
459 out_err_sum += out_err;
460 out_err_sumsq += out_err * out_err;
461}
Dominic Symesb5b06782023-07-27 11:50:57 +0100462if (input and weights are data set S with 3 <= S <= 5) {
Dominic Symes5b936a32023-03-01 11:34:40 +0000463 // check output error bias magnitude for data sets S which are not positive biased
Dominic Symesb5b06782023-07-27 11:50:57 +0100464 REQUIRE(abs(out_err_sum) <= 2*sqrt(ksb*T));
Dominic Symes5b936a32023-03-01 11:34:40 +0000465}
466// check output error variance magnitude
Dominic Symesb5b06782023-07-27 11:50:57 +0100467REQUIRE(out_err_sumsq <= 0.4*ksb*T)
Dominic Symes5b936a32023-03-01 11:34:40 +0000468----
Dominic Symesca2a8542021-03-19 13:56:27 +0000469
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700470=== Tensor Definitions
Eric Kunze3309a532020-10-01 18:50:46 -0700471
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700472==== Tensors
Eric Kunze3309a532020-10-01 18:50:46 -0700473
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700474Tensors are multidimensional arrays of data.
475Tensors have metadata associated with them that describe characteristics of the tensor, including:
Eric Kunze3309a532020-10-01 18:50:46 -0700476
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700477* Data Type
478* Shape
Eric Kunze3309a532020-10-01 18:50:46 -0700479
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700480The number of dimensions in a shape is called the rank.
481A tensor with rank equal to zero is permitted.
Dominic Symes830b43b2023-05-09 10:14:49 +0100482In that case, the tensor has a single entry and is also known as a scalar.
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700483A tensor shape is an array of integers of size equal to the rank of the tensor.
484Each element in the tensor shape describes the number of elements in the dimension.
485The tensor shape in each dimension must be greater than or equal to 1.
486For tensor access information, see <<Tensor Access Helpers>>.
Dominic Symes830b43b2023-05-09 10:14:49 +0100487
Eric Kunze526f6c72024-01-12 17:18:42 -0800488The shape of a tensor of non-zero rank is a special type shape_t.
489shape_t is a one-dimensional list with the size equal to the rank of the original tensor.
490The components of a shape_t are of type size_t.
Dominic Symes830b43b2023-05-09 10:14:49 +0100491
Dominic Symes830b43b2023-05-09 10:14:49 +0100492In this version of the specification, shape_t values must be resolvable to constants at backend compile time.
Eric Kunze3309a532020-10-01 18:50:46 -0700493
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700494==== Tensor size limit
Eric Kunze3309a532020-10-01 18:50:46 -0700495
Dominic Symesbc72ba82023-04-24 17:08:02 +0100496The tensor overall size is limited by the data type size_t.
Eric Kunze526f6c72024-01-12 17:18:42 -0800497This type must be able to hold integers in the range 0 to (1 << (MAX_LOG2_SIZE + 1)) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>.
498For each tensor, the number of tensor elements multiplied by the element size in bytes (which is taken to be 1 for elements smaller than a 8-bit) must be less than or equal to (1 << (MAX_LOG2_SIZE + 1)) - 1.
Dominic Symesbc72ba82023-04-24 17:08:02 +0100499
Eric Kunze526f6c72024-01-12 17:18:42 -0800500The size of tensors along each of their dimensions is limited by the data type size_t.
501
502This means that the maximum size of a tensor along each dimension is (1 << MAX_LOG2_SIZE) - 1 and therefore the maximum coordinate value is (1 << MAX_LOG2_SIZE) - 2.
Dominic Symes0205d992022-10-07 15:03:01 +0100503Indices used to access tensors must be non-negative.
Eric Kunze3309a532020-10-01 18:50:46 -0700504
Dominic Symes830b43b2023-05-09 10:14:49 +0100505
Eric Kunze3309a532020-10-01 18:50:46 -0700506==== Data Layouts
507
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700508The following data layouts are supported in TOSA.
509TOSA operations are defined in terms of a linear packed tensor layout.
510In a linear packed layout a rank r tensor has elements of dimension (r-1) consecutive.
511The next to increment is dimension (r-2) and so on.
512For a specification of this layout see the tensor read and write functions in section <<Tensor Access Helpers>>.
513
514An implementation of TOSA can choose a different tensor memory layout provided that the operation behavior is maintained.
Eric Kunze3309a532020-10-01 18:50:46 -0700515
516.Data Layouts
517[cols="1,4,4"]
518|===
519|Name|Description of dimensions|Usage
520
521|NHWC|Batch, Height, Width, Channels|Feature maps
522|NDHWC|Batch, Depth, Height, Width, Channels|Feature maps for 3D convolution
523|OHWI|Output channels, Filter Height, Filter Width, Input channels|Weights
524|HWIM|Filter Height, Filter Width, Input channels, Channel Multiplier|Weights for depthwise convolutions
525|DOHWI|Depth, Output Channels, Filter Height, Filter Width, Input Channels|Weights for 3D convolution
526|===
527
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700528==== Broadcasting
Eric Kunze3309a532020-10-01 18:50:46 -0700529
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700530In operations where broadcasting is supported, an input shape dimension can be broadcast to an output shape dimension if the input shape dimension is 1.
531TOSA broadcast requires the rank of both tensors to be the same.
532A RESHAPE can be done to create a compatible tensor with appropriate dimensions of size 1.
533To map indexes in an output tensor to that of an input tensor, see <<Broadcast Helper>>.
Eric Kunze3309a532020-10-01 18:50:46 -0700534
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700535==== Supported Number Formats
Eric Kunze3309a532020-10-01 18:50:46 -0700536
Eric Kunze1e9ba652021-02-17 19:23:39 -0800537The following number formats are defined in TOSA.
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700538The number formats supported by a given operator are listed in its table of supported types.
Eric Kunze6dd34102024-02-25 22:24:52 -0800539A TOSA implementation must support the number formats listed in the supported data types for operators contained in that profile.
540Number formats not required for any operators in a profile do not need to be implemented.
Eric Kunze3309a532020-10-01 18:50:46 -0700541
542.Number formats
Eric Kunze1e9ba652021-02-17 19:23:39 -0800543[cols="1,1,1,5"]
Eric Kunze3309a532020-10-01 18:50:46 -0700544|===
545|Format|Minimum|Maximum|Description
546
Eric Kunze1e9ba652021-02-17 19:23:39 -0800547|bool_t
Eric Kunze3309a532020-10-01 18:50:46 -0700548| -
549| -
Kevin Petitf9fcb612024-01-23 19:09:29 +0000550|Boolean value that is either `true` or `false`. Size implementation defined. The TOSA reference model implements this as int8_t with 0 for `false` and 1 for `true`. All non-zero values are accepted on input as `true`.
Eric Kunze3309a532020-10-01 18:50:46 -0700551
Eric Kunzefb0284e2023-07-18 15:20:53 -0700552|i4_t
553| -
554| -
555|Signless 4-bit integer type. Will be interpreted as int4_t by all operators
556
Eric Kunze1e9ba652021-02-17 19:23:39 -0800557|int4_t
Eric Kunze3309a532020-10-01 18:50:46 -0700558| -7
559| +7
Eric Kunzeeef012e2022-05-13 14:54:06 -0700560|Signed 4-bit two's-complement value. Excludes -8 to maintain a symmetric about zero range for weights.
Eric Kunze3309a532020-10-01 18:50:46 -0700561
Eric Kunzefb0284e2023-07-18 15:20:53 -0700562|i8_t
563| -
564| -
565|Signless 8-bit integer value. Will be interpreted as int8_t unless otherwise specified by an operator.
566
Eric Kunze1e9ba652021-02-17 19:23:39 -0800567|int8_t
Eric Kunze3309a532020-10-01 18:50:46 -0700568| -128
569| +127
Eric Kunzeeef012e2022-05-13 14:54:06 -0700570|Signed 8-bit two's-complement value.
Eric Kunze3309a532020-10-01 18:50:46 -0700571
Eric Kunze1e9ba652021-02-17 19:23:39 -0800572|uint8_t
Eric Kunze3309a532020-10-01 18:50:46 -0700573| 0
574| 255
Eric Kunzefb0284e2023-07-18 15:20:53 -0700575|Unsigned 8-bit integer value.
576
577|i16_t
578| -
579| -
580|Signless 16-bit integer type. Will be interpreted as int16_t unless otherwise specified by an operator.
Eric Kunze3309a532020-10-01 18:50:46 -0700581
Eric Kunze1e9ba652021-02-17 19:23:39 -0800582|int16_t
Eric Kunze3309a532020-10-01 18:50:46 -0700583| -32768
Eric Kunze2dce0d02021-01-12 16:19:50 -0800584| +32767
Eric Kunzeeef012e2022-05-13 14:54:06 -0700585|Signed 16-bit two's-complement value.
586
587|uint16_t
588| 0
589| 65535
590|Unsigned 16-bit value.
Eric Kunze3309a532020-10-01 18:50:46 -0700591
Eric Kunzefb0284e2023-07-18 15:20:53 -0700592|i32_t
593| -
594| -
595|Signless 32-bit integer value. Will be interpreted as int32_t by all operators.
596
Eric Kunze1e9ba652021-02-17 19:23:39 -0800597|int32_t
Eric Kunze3309a532020-10-01 18:50:46 -0700598| -(1<<31)
Eric Kunze2dce0d02021-01-12 16:19:50 -0800599| (1<<31)-1
Eric Kunze173fc162021-08-17 14:57:46 -0700600|Signed 32-bit two's-complement value.
Eric Kunze3309a532020-10-01 18:50:46 -0700601
Eric Kunzefb0284e2023-07-18 15:20:53 -0700602|i48_t
603| -
604| -
Eric Kunze2f3f4a22024-01-08 14:22:11 -0800605|Signless 48-bit integer value. Will be interpreted as int48_t by all operators.
Eric Kunzefb0284e2023-07-18 15:20:53 -0700606
Eric Kunze1e9ba652021-02-17 19:23:39 -0800607|int48_t
Eric Kunze57e79c02020-11-03 11:23:09 -0800608| -(1<<47)
Eric Kunze2dce0d02021-01-12 16:19:50 -0800609| (1<<47)-1
Eric Kunze173fc162021-08-17 14:57:46 -0700610|Signed 48-bit two's-complement value.
Eric Kunze57e79c02020-11-03 11:23:09 -0800611
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700612|fp8e4m3_t
613| -448
614| 448
615| 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with four bits of exponent and three bits of mantissa. +
616Normal values must be supported. +
617Denormal values must be supported. +
618The NaN encoding must be supported. +
619Signed zero must be supported.
620
621|fp8e5m2_t
622| -infinity
623| +infinity
624| 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with five bits of exponent and two bits of mantissa. +
625Normal values must be supported. +
626Denormal values must be supported. +
627Positive and negative infinity must be supported. +
628NaN encodings must be supported. +
629Signed zero must be supported.
630
Eric Kunze42229d02022-04-07 16:54:46 -0700631|fp16_t
Eric Kunze3309a532020-10-01 18:50:46 -0700632| -infinity
633| +infinity
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700634| 16-bit half-precision floating-point defined by <<IEEE-754,IEEE-754>> . +
Dominic Symesc386a052023-01-20 16:09:31 +0000635Normal values must be supported. +
636Denormal values must either be supported or flushed to zero. +
637Positive and negative infinity must be supported. +
638At least one NaN encoding must be supported. +
639Signed zero must be supported.
Eric Kunze42229d02022-04-07 16:54:46 -0700640
641|bf16_t
642| -infinity
643| +infinity
Dominic Symesc386a052023-01-20 16:09:31 +0000644| 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. +
645Normal values must be supported. +
646Denormal values must either be supported or flushed to zero. +
647Positive and negative infinity must be supported. +
648At least one NaN encoding must be supported. +
649Signed zero must be supported.
Eric Kunze42229d02022-04-07 16:54:46 -0700650
651|fp32_t
652| -infinity
653| +infinity
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700654| 32-bit single-precision floating-point defined by <<IEEE-754,IEEE-754>> . +
Eric Kunze277a4f12023-05-12 17:50:19 -0700655Normal values must be supported. +
656Denormal values must either be supported or flushed to zero. +
657Positive and negative infinity must be supported. +
658At least one NaN encoding must be supported. +
659Signed zero must be supported.
660
661|fp64_t
662| -infinity
663| + infinity
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700664| 64-bit double-precision floating-point defined by <<IEEE-754,IEEE-754>>. +
Dominic Symesc386a052023-01-20 16:09:31 +0000665Normal values must be supported. +
666Denormal values must either be supported or flushed to zero. +
667Positive and negative infinity must be supported. +
668At least one NaN encoding must be supported. +
669Signed zero must be supported.
Eric Kunze3309a532020-10-01 18:50:46 -0700670|===
671
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700672Note: In this specification minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point).
673The minimum and maximum values for each type is given in the preceeding table.
Eric Kunze3309a532020-10-01 18:50:46 -0700674
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700675Note: Integer number formats smaller than 8 bits may be used provided that the numerical result is the same as using a sequence of 8-bit TOSA operations.
676For example, a convolution with low precision data must equal that of running the convolution at 8 bits and then clipping the result to the peritted output range.
677This ensures that a Base Inference profile TOSA implementation can calculate the same result.
Eric Kunze3309a532020-10-01 18:50:46 -0700678
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700679=== Integer Behavior
Eric Kunze3309a532020-10-01 18:50:46 -0700680
Eric Kunzefb0284e2023-07-18 15:20:53 -0700681TOSA integer inputs and outputs are specified by signless values with the given number of bits.
682Unless otherwise specified, these values will be interpreted as signed twos-complement.
683The pseudocode will use int*_t to indicate use as a signed value and uint*_t to indicate use as an unsigned value.
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700684If overflow occurs doing integer calculation, the result is unpredictable, as indicated by the REQUIRE checks in the pseudocode for the operators.
Eric Kunze3309a532020-10-01 18:50:46 -0700685
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700686Unsigned 8 and 16-bit values are only allowed in the RESCALE operation, to allow for compatibility with networks which expect unsigned 8-bit or 16-bit tensors for input and output.
Eric Kunze3309a532020-10-01 18:50:46 -0700687
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700688==== Quantization
Eric Kunze3309a532020-10-01 18:50:46 -0700689
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700690Machine Learning frameworks may represent tensors with a quantized implementation, using integer values to represent the original floating-point numbers.
691TOSA integer operations do not perform any implicit scaling to represent quantized values.
692Required zero point values are passed to the operator as necessary, and will be processed according to the pseudocode for each operator.
Eric Kunzec949f8a2021-09-16 14:51:26 -0700693
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700694To convert a network containing quantized tensors to TOSA, generate explicit RESCALE operators for any change of quantization scaling.
695This reduces quantized operations to purely integer operations.
Eric Kunze839830a2021-03-11 15:38:22 -0800696
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700697As an example, an ADD between two quantized tensors requires the integer values represent the same range.
Kevin Petit5333c252023-05-16 09:08:48 +0100698The scale arguments for RESCALE can be calculated to ensure that the resulting tensors represent the same range.
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700699Then the ADD is performed, and a RESCALE can be used to ensure that the result is scaled properly.
Eric Kunze3309a532020-10-01 18:50:46 -0700700
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700701RESCALE provides support for per-tensor and per-channel scaling values to ensure compatibility with a range of possible quantization implementations.
Eric Kunze3309a532020-10-01 18:50:46 -0700702
Eric Kunzec949f8a2021-09-16 14:51:26 -0700703
Eric Kunze3309a532020-10-01 18:50:46 -0700704
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700705==== Precision scaling
Eric Kunze3309a532020-10-01 18:50:46 -0700706
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700707TOSA uses the RESCALE operation to scale between values with differing precision.
708The RESCALE operator is defined using an integer multiply, add, and shift.
709This guarantees that all TOSA implementations will return the same result for a RESCALE, including those with no support for floating-point numbers.
Eric Kunze3309a532020-10-01 18:50:46 -0700710
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700711This TOSA specification supports two precisions of multiplier: 16-bit and 32-bit.
712The 32-bit multiplier version supports two rounding modes to enable simpler lowering of existing frameworks that use two stage rounding.
713All arithmetic is designed so that it does not overflow a 64-bit accumulator and that the final result fits in 32 bits.
714In particular a 48-bit value can only be scaled with the 16-bit multiplier.
Eric Kunze3309a532020-10-01 18:50:46 -0700715
Dominic Symes3cb75352022-01-24 11:18:05 +0000716The apply_scale functions provide a scaling of approximately (multiplier * 2^-shift^).
717The shift and value range is limited to allow a variety of implementations.
718The limit of 62 on shift allows the shift to be decomposed as two right shifts of 31.
Eric Kunzece6e02c2022-03-11 15:12:38 -0800719The limit on value allows implementations that left shift the value before the multiply in the case of shifts of 32 or less.
Dominic Symes3cb75352022-01-24 11:18:05 +0000720For example, in the case shift=30 an implementation of the form ((value\<<2) * multiplier + round)>>32 can be used.
721A scaling range of 2^+12^ down to 2^-32^ is supported for both functions with a normalized multiplier.
722
723For example, in typical usage a scaling of m*2^-n^ where m is a fraction in the
724range 1.0 \<= m < 2.0 can be represented using multiplier=(1<<30)*m, shift=(30+n) for
725apply_scale_32() and multiplier=(1<<14)*m, shift=(14+n) for apply_scale_16().
726The values to achieve a scaling of 1.0 are shift=30, multiplier=1<<30 for apply_scale_32 and shift=14, multiplier=1<<14 for apply_scale_16.
Eric Kunze3309a532020-10-01 18:50:46 -0700727
Eric Kunze839830a2021-03-11 15:38:22 -0800728[source,c++]
729----
Kevin Petit98b3e332023-05-16 09:13:50 +0100730int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t double_round=false) {
Eric Kunzea9101532021-06-17 18:01:09 -0700731 REQUIRE(multiplier >= 0);
732 REQUIRE(2 <= shift && shift <= 62);
Dominic Symes830b43b2023-05-09 10:14:49 +0100733 REQUIRE(value >= (-1 << (shift - 1)) && value < (1 << (shift - 1)));
Eric Kunze839830a2021-03-11 15:38:22 -0800734 int64_t round = 1 << (shift - 1);
735 if (double_round) {
736 if (shift > 31 && value >= 0) round += 1<<30;
737 if (shift > 31 && value < 0) round -= 1<<30;
738 }
Eric Kunzefb0284e2023-07-18 15:20:53 -0700739 int64_t result = static_cast<int64_t>(value) * multiplier + round;
Eric Kunze839830a2021-03-11 15:38:22 -0800740 result = result >> shift;
Dominic Symes3cb75352022-01-24 11:18:05 +0000741 // result will fit a 32-bit range due to the REQUIRE on value
Eric Kunzefb0284e2023-07-18 15:20:53 -0700742 return static_cast<int32_t>(result);
Eric Kunze3309a532020-10-01 18:50:46 -0700743}
744
Kevin Petit98b3e332023-05-16 09:13:50 +0100745int32_t apply_scale_16(int48_t value, int16_t multipler, int8_t shift) {
Eric Kunzea9101532021-06-17 18:01:09 -0700746 REQUIRE(multiplier >= 0);
747 REQUIRE(2 <= shift && shift <= 62);
Eric Kunze839830a2021-03-11 15:38:22 -0800748 int64_t round = (1 << (shift - 1));
Eric Kunzefb0284e2023-07-18 15:20:53 -0700749 int64_t result = static_cast<int64_t>(value) * multiplier + round;
Eric Kunze839830a2021-03-11 15:38:22 -0800750 result = result >> shift;
Eric Kunzea9101532021-06-17 18:01:09 -0700751 REQUIRE(result >= minimum<int32_t> && result <= maximum<int32_t>);
Eric Kunzefb0284e2023-07-18 15:20:53 -0700752 return static_cast<int32_t>(result);
Eric Kunze3309a532020-10-01 18:50:46 -0700753}
Eric Kunze839830a2021-03-11 15:38:22 -0800754----
Eric Kunze3309a532020-10-01 18:50:46 -0700755
756In some functions, the multiplier and shift are combined into a scale_t structure:
757
Eric Kunze839830a2021-03-11 15:38:22 -0800758[source,c++]
759----
Eric Kunze3309a532020-10-01 18:50:46 -0700760typedef struct {
Eric Kunze839830a2021-03-11 15:38:22 -0800761 int32_t multiplier;
Kevin Petit98b3e332023-05-16 09:13:50 +0100762 int8_t shift;
Eric Kunze3309a532020-10-01 18:50:46 -0700763} scale_t;
Eric Kunze839830a2021-03-11 15:38:22 -0800764----
Eric Kunze3309a532020-10-01 18:50:46 -0700765
766In places where a divide is required, we also use the function below to calculate an appropriate scaling value.
767
Eric Kunze839830a2021-03-11 15:38:22 -0800768[source,c++]
769----
Eric Kunze3309a532020-10-01 18:50:46 -0700770scale_t reciprocal_scale(uint32_t value) {
Eric Kunzea9101532021-06-17 18:01:09 -0700771 REQUIRE(value > 0);
Eric Kunze839830a2021-03-11 15:38:22 -0800772 scale_t scale;
Dominic Symescb6c6b32022-04-29 16:15:56 +0100773 int32_t k = 32 - count_leading_zeros(value - 1); // (1 << k) / 2 < value <= (1 << k)
Eric Kunze839830a2021-03-11 15:38:22 -0800774 int64_t numerator = ((1 << 30) + 1) << k;
775 scale.multiplier = numerator / value; // (1 << 30) <= multiplier < (1 << 31)
776 scale.shift = 30 + k;
777 return scale;
Eric Kunze3309a532020-10-01 18:50:46 -0700778}
Eric Kunze839830a2021-03-11 15:38:22 -0800779----
Eric Kunze3309a532020-10-01 18:50:46 -0700780
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700781==== Integer Convolutions
Eric Kunze1e9ba652021-02-17 19:23:39 -0800782
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700783For the convolution operators, the input is not required to be scaled.
784The integer versions of the convolution operators will subtract the zero point from the integer values as defined for each operator.
Eric Kunze1e9ba652021-02-17 19:23:39 -0800785The convolution produces an accumulator output of type int32_t or int48_t.
786This accumulator output is then scaled to the final output range using the RESCALE operator.
787The scale applied in the RESCALE operator should be set to multiplier and shift values such that: multiplier * 2^-shift^ = (input scale * weight scale) / output_scale.
788Here, input_scale, weight_scale and output_scale are the conversion factors from integer to floating-point for the input, weight and output tensor values respectively.
789If per-channel scaling is needed then the per-channel option of the RESCALE operation should be used.
790
Eric Kunzef9e5ba92022-05-26 16:38:40 -0700791==== Integer Elementwise Operators
Eric Kunze1e9ba652021-02-17 19:23:39 -0800792
793When two quantized tensors are used in an operation, they must represent the same numeric range for the result to be valid.
794In this case, TOSA expects that RESCALE operators will be used as necessary to generate 32-bit integer values in a common range.
795There are many valid choices for scale factors and options for the common range.
796TOSA does not impose a requirement on which scale factors and range should be used.
797Compilers generating TOSA sequences should choose a range that allows the operation to be computed without overflow, while allowing the highest possible accuracy of the output.
798
799==== General Unary Functions
800General unary functions such as sigmoid(), tanh(), exp() for integer inputs are expressed using a lookup table and interpolation to enable efficient implementation.
801This also allows for other operations with the addition of user-supplied tables (the TABLE operation).
802All table lookups are based on the following reference lookup function that takes as input a table of 513 entries of 16 bits each.
803
Eric Kunze839830a2021-03-11 15:38:22 -0800804[source,c++]
805----
Eric Kunzefb0284e2023-07-18 15:20:53 -0700806int32_t apply_lookup_s(int16_t *table, int32_t value)
Eric Kunze1e9ba652021-02-17 19:23:39 -0800807{
Eric Kunzefb0284e2023-07-18 15:20:53 -0700808 int16_t clipped_value = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767));
Eric Kunze1e9ba652021-02-17 19:23:39 -0800809 int32_t index = (clipped_value + 32768) >> 7;
810 int32_t fraction = clipped_value & 0x7f;
811 int16_t base = table[index];
812 int16_t next = table[index+1];
Dominic Symes2ff79fe2022-01-27 15:44:26 +0000813 int32_t slope = next - base;
814 REQUIRE(slope >= minimum<int16_t> && slope <= maximum<int16_t>)
815 int32_t return_value = (base << 7) + slope * fraction;
Eric Kunze1e9ba652021-02-17 19:23:39 -0800816 return return_value; // return interpolated value of 16 + 7 = 23 bits
817}
Eric Kunze839830a2021-03-11 15:38:22 -0800818----
Eric Kunze1e9ba652021-02-17 19:23:39 -0800819
820Note that although the table lookup defined here has 16-bit precision, for 8-bit only operations an 8-bit table can be derived by applying the reference function to each of the possible 256 input values.
821The following code constructs a 513-entry table based on a reference function.
822
Eric Kunze839830a2021-03-11 15:38:22 -0800823[source,c++]
824----
Eric Kunze1e9ba652021-02-17 19:23:39 -0800825void generate_lookup_table(int16_t *table, int32_t (*reference)(int32_t))
826{
827 for (int i = -256; i <= 256; i++) {
828 int32_t value = (*reference)(i);
Eric Kunze60858832024-01-22 16:54:29 -0800829 table[i + 256] = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767));
Eric Kunze1e9ba652021-02-17 19:23:39 -0800830 }
831}
Eric Kunze839830a2021-03-11 15:38:22 -0800832----
Eric Kunze1e9ba652021-02-17 19:23:39 -0800833
Dominic Symesc386a052023-01-20 16:09:31 +0000834=== Other publications
Eric Kunze1e9ba652021-02-17 19:23:39 -0800835
Dominic Symesc386a052023-01-20 16:09:31 +0000836The following publications are referred to in this specification, or provide more information:
Eric Kunze1e9ba652021-02-17 19:23:39 -0800837
Eric Kunze74e2ceb2023-10-20 15:58:55 -0700838. [[IEEE-754]]IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008.
839. [[OCP-OFP8]]Open Compute Project OCP 8-bit Floating Point Specification (OFP8) Revision 1.0