Blame - chapters/introduction.adoc - tosa/specification

blob: 95e405ca8eca8875e24f28f7bc2cb48d7c4de572 [file] [log] [blame]

Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	1	//
				2	// This confidential and proprietary software may be used only as
				3	// authorised by a licensing agreement from ARM Limited
Eric Kunze	18acfe3	2024-01-03 10:55:00 -0800	[diff] [blame]	4	// (C) COPYRIGHT 2020-2024 ARM Limited
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	5	// ALL RIGHTS RESERVED
				6	// The entire notice above must be reproduced on all authorised
				7	// copies and copies may only be made to the extent permitted
				8	// by a licensing agreement from ARM Limited.
				9
				10	== Introduction
				11
				12	=== Overview
				13
Eric Kunze	fa1b324	2020-11-09 13:53:23 -0800	[diff] [blame]	14	Tensor Operator Set Architecture (TOSA) provides a set of whole-tensor
				15	operations commonly employed by Deep Neural Networks. The intent is to enable a
				16	variety of implementations running on a diverse range of processors, with the
				17	results at the TOSA level consistent across those implementations. Applications
				18	or frameworks which target TOSA can therefore be deployed on a wide range of
				19	different processors, such as SIMD CPUs, GPUs and custom hardware such as
				20	NPUs/TPUs, with defined accuracy and compatibility constraints. Most operators
				21	from the common ML frameworks (TensorFlow, PyTorch, etc.) should be expressible
				22	in TOSA. It is expected that there will be tools to lower from ML frameworks
				23	into TOSA.
				24
				25	=== Goals
				26
				27	The goals of TOSA include the following:
				28
				29	* A minimal and stable set of tensor-level operators to which machine learning
				30	framework operators can be reduced.
				31
				32	* Full support for both quantized integer and floating-point content.
				33
				34	* Precise functional description of the behavior of every operator, including
				35	the treatment of their numerical behavior in the case of precision, saturation,
				36	scaling, and range as required by quantized datatypes.
				37
				38	* Agnostic to any single high-level framework, compiler backend stack or
				39	particular target.
				40
				41	* The detailed functional and numerical description enables precise code
				42	construction for a diverse range of targets – SIMD CPUs, GPUs and custom
				43	hardware such as NPUs/TPUs.
				44
				45	=== Specification
				46
				47	The TOSA Specification is written as AsciiDoc mark-up and developed in its raw
				48	mark-up form, managed through a git repository here:
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	49	https://git.mlplatform.org/tosa/specification.git/.
				50	The specification is developed and versioned much like software.
				51	While the mark-up is legible and can be read fairly easily in its raw form, it is recommended to build or “render” the mark-up into PDF or HTML.
				52	To do this, please follow the instructions in the README.md in the root of the specification repository.
				53
				54	=== Operator Selection Principles
				55
				56	TOSA defines a set of primitive operators to which higher level operators can be lowered in a consistent way.
				57	To remain effective and efficient to implement, the set of operators must be constrained to a reasonably small set of primitive operations out of which others can be constructed.
				58	The following principles govern the selection of operators within TOSA.
				59
				60	.Principles
				61	[cols="1,5,5"]
				62	\|===
				63	\|ID\|Principle\|Reason for this
				64
				65	\|P0
				66	\|An operator shall be a primitive operation or building block that cannot be decomposed into simpler whole tensor operations.
				67	\|If the operator can be broken down, then we should look at the component operators.
				68
				69	\|P1
				70	\|An operator shall be a usable as a component out of which more complex operations can be constructed.
				71	\|Single use operators have a high architectural cost and a more reusable version should be considered instead.
				72
				73	\|P2
				74	\|Precision should be appropriate for the input and output data types.
				75	\|Precision higher than that needed to calculate the result leads to extra implementation cost.
				76
				77	\|P3
				78	\|Numerical definition of common sub-operations should be consistent between operators (for example: value scaling).
				79	\|Consistent sub-operation definition reduces the operator implementation cost.
				80
				81	\|P4
Kevin Petit	5333c25	2023-05-16 09:08:48 +0100	[diff] [blame]	82	\|The valid input and output ranges for all arguments shall be specified.
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	83	\|Ranges are required to make consistent (numerically agreeing) implementations possible.
				84
				85	\|P5
				86	\|Integer operators shall be implementable in a bit-exact form with good efficiency on CPU, GPU and hardware targets.
				87	\|Reduces implementation cost and gives consistent inference results.
				88	\|===
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	89
Eric Kunze	618f66a	2024-04-16 17:54:34 -0700	[diff] [blame^]	90	=== Versioning
				91
				92	TOSA follows a semantic versioning policy with a major.minor.patch.draft scheme.
				93	See below for the TOSA definition of backward compatibility.
				94
				95	* Major version changes may break backwards compatibility.
				96	* Minor numbers may add functionality in a backwards compatible way.
				97	* Patch versions are for bug fixes, clarifications, or trivial changes.
				98	* The draft flag notes whether the version referenced is finalized.
				99
				100	Major, minor, and patch numbers are limited to eight bits.
				101	Draft is a single bit flag.
				102	If stored in a 32-bit value, the remaining bits are reserved for future use.
				103
				104	==== Backwards Compatibility
				105
				106	TOSA graphs created with previous minor versions within a major version must continue to work.
				107	The following portions of the specification and implementation will not change within a major version:
				108
				109	* Operator Names
				110	* Arguments including ordering, input/attribute/output, name, rank
				111	* ERROR_IF statements
				112	* Functionality of the pseudocode for each operator
				113	* Level definitions and checks
				114	* Supported Data Type tables
				115	* Conformance test definitions
				116	* Enumerated types and values
				117
				118	Changes to the following do not break compatibility:
				119
				120	* Order of operations within the XML
				121	* Operator section names
				122	* Descriptive text that does not affect functionality
				123	* Non-functional changes to pseudocode (for example: cleanup, local variable name changes)
				124
				125	Minor versions are allowed to add new operators or other functionality as long as the above guarantees hold.
				126
				127	In addition, new extensions may be added to the specification between TOSA releases.
				128	They may not change anything that would break backward compatibility according to the above definitions.
				129
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	130	=== Profiles
				131
Eric Kunze	6dd3410	2024-02-25 22:24:52 -0800	[diff] [blame]	132	TOSA profiles enable efficient implementation on different classes of device.
				133	Each profile is an independent set of operations and data type combinations.
				134
				135	TOSA profile extensions define optional operation and data type combinations.
				136
				137	Each operator's Supported Data Types table will define which profile or extension an operator and data type is in.
				138	An operator / data type combination may be part of multiple profiles or extensions.
				139	If so, each profile and extension will be listed in the Supported Data Types table.
				140	In addition, a table listing all operations for each profile can be found in Appendix B.
				141
				142	The following are required for compliant TOSA implementations:
				143
				144	* A TOSA implementation must implement at least one profile.
				145	* A TOSA implementation may choose to implement any extensions.
				146	* If a TOSA implementation chooses to implement an extension, it must implement the complete extension.
				147	* If a operator / data type combination requires multiple extensions, the combination is only required to be implemented if all extensions are implemented
				148	** For example, a CAST from bf16 to fp8 is only required if both extensions are implemented.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	149
				150	.Profiles
Eric Kunze	6dd3410	2024-02-25 22:24:52 -0800	[diff] [blame]	151	include::{generated}/profiles.adoc[]
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	152
Eric Kunze	6dd3410	2024-02-25 22:24:52 -0800	[diff] [blame]	153	.Profile Extensions
				154	include::{generated}/profile_extensions.adoc[]
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	155
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	156	=== Levels
				157
Kevin Petit	5333c25	2023-05-16 09:08:48 +0100	[diff] [blame]	158	A TOSA level defines operator argument ranges that an implementation shall support.
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	159	This is distinct from a profile that defines the operations and data-types supported.
Eric Kunze	6dd3410	2024-02-25 22:24:52 -0800	[diff] [blame]	160	One level must apply to all profiles and extensions supported by an implementation.
				161
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	162	This version of the specification defines two TOSA levels:
				163
Kevin Petit	5333c25	2023-05-16 09:08:48 +0100	[diff] [blame]	164	* No level : allows the full range of arguments specified by the operations according to the operation data types.
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	165	* Level 8K : ranges are expected to be sufficient for applications with frame sizes up to 8K.
				166
				167	Later versions of the specification may define additional levels.
Eric Kunze	0d7d001	2024-03-25 14:07:29 -0700	[diff] [blame]	168	The following table defines the value ranges for each level.
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	169	These ranges are checked using the LEVEL_CHECK() function with the operator descriptions.
				170
				171	.Level maximums
Kevin Petit	211c5f5	2023-04-26 16:25:52 +0100	[diff] [blame]	172	include::{generated}/levels.adoc[]
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	173
Eric Kunze	42229d0	2022-04-07 16:54:46 -0700	[diff] [blame]	174	=== Status
				175
				176	The TOSA specification is a work in progress.
				177
				178	* The Base Inference profile should be considered to be near release quality, with conformance tests available.
				179	* The Main Inference profile has most of the expected operators in place, but is still subject to change.
				180	* The reference model and conformance tests do not yet support all of the floating point types that have been defined.
				181	* There is not currently a conformance test suite available for Main Inference.
Eric Kunze	42229d0	2022-04-07 16:54:46 -0700	[diff] [blame]	182
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	183	=== Compliance
				184
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	185	This section defines when a TOSA implementation is compliant to a given TOSA specification profile and level.
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	186	To be compliant an implementation must achieve the results and accuracy defined by this specification.
				187	TOSA also defines a set of conformance tests.
				188	A compliant implementation must pass the conformance tests.
				189	The conformance tests are not exhaustive, so an implementation that passes the conformance tests may not be compliant if there is a non-compliance that is undetected by the tests.
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	190
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	191	==== Base Inference Profile Compliance
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	192
Eric Kunze	a3eded0	2021-12-13 15:40:04 -0800	[diff] [blame]	193	The <<Operator Graphs>> section of this specification defines a TOSA graph and the behavior defined for a TOSA graph.
				194	This behavior is captured in the pseudo-code function tosa_execute_graph().
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	195	For a given input graph (with attributes) and input tensors there are three possible tosa_graph_result values after executing the graph:
				196
				197	* tosa_unpredictable: The result of the graph on the given inputs cannot be relied upon.
				198	* tosa_error: The graph does not meet the specification and is recognised as an illegal graph.
				199	* tosa_valid: The result is defined and predictable and the list of output tensors defines the result.
				200
				201	An implementation is compliant to the TOSA Baseline Inference Profile if it matches the above results as follows:
				202
				203	* For tosa_unpredictable, the implementation can return whatever result it chooses (including error)
				204	* For tosa_error, the implementation must return an error result (and there is no requirement on how much of the graph is executed, if any)
				205	* For tosa_valid, the implementation must execute the entire graph without error and return the result defined by this specification.
				206
				207	In terms of psuedo-code, if graph is a TOSA graph consisting of Baseline Inference Profile operators and input_list is a list of input tensors then the following test must pass.
				208
				209	[source,c++]
				210	----
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	211	bool tosa_test_compliance(tosa_graph_t graph, tosa_list_t input_list, tosa_level_t level) {
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	212	shape_list_t output_list_spec = tosa_allocate_list(tosa_output_shape(graph));
				213	shape_list_t output_list_test = tosa_allocate_list(tosa_output_shape(graph));
Dominic Symes	7b0f1c9	2023-07-20 14:26:38 +0100	[diff] [blame]	214	tosa_graph_result = tosa_valid; // result starts as valid
				215	tosa_nesting_depth = 0; // if/while nesting level
Dominic Symes	e4d6a1b	2022-11-04 18:00:03 +0000	[diff] [blame]	216	tosa_execute_graph(graph, input_list, output_list_spec, level);
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	217	if (tosa_graph_result == tosa_unpredictable) {
				218	return true; // No requirement to match an unpredictable result
				219	}
				220	result_test = execute_implementation_under_test(graph, input_list, output_list_test);
				221	if (tosa_graph_result == tosa_error) {
				222	return result_test == tosa_error; // result must be an error
				223	}
				224	if (exact_tensor_match(output_list_spec, output_list_test)) {
				225	// Predictable bit-exact value match required
				226	return true;
				227	}
				228	return false;
				229	}
				230	----
				231
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	232	==== Main Inference Profile Compliance
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	233
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	234	A Main Inference compliant implementation must satisfy the following:
				235
				236	* The implementation must meet <<Base Inference Profile Compliance>> for all Base inference complaint graphs
				237	* The implementation must support all Main Inference operations using the datatype fp32_t
				238	** The operations must meet the precision requirements of <<Main Inference precision requirements>>
				239	* The implementation must support all Main Inference operations using the datatype fp16_t
				240	** The operations must meet the precision requirements of <<Main Inference precision requirements>>
				241	** Note: These requirements allow fp16_t operations to be implemented using the fp32_t datatype
				242	* The implementation must support all Main Inference operations using the datatype bf16_t
				243	** The operations must meet the precision requirements of <<Main Inference precision requirements>>
				244	** Note: These requirements allow bf16_t operations to be implemented using the fp32_t datatype
				245
				246	As with <<Base Inference Profile Compliance>> the pseudo-code function tosa_execute_graph() can return one of three possible results.
				247	A compliant implementation must satisfy the following:
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	248
				249	* For a graph returning tosa_error the implementation must also return an error
				250	* For a graph returning tosa_valid the implementation must execute the entire graph without error
				251	* For a graph returning tosa_valid and consisting only of integer operators the results must match exactly
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	252
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	253	===== Main Inference precision requirements
				254
Dominic Symes	c237b7e	2023-09-20 15:08:53 +0100	[diff] [blame]	255	In a compliant implementation, individual floating-point operations within the graph must meet the accuracy bounds listed in the table following.
				256	In the table _ulp_ means unit of the last place.
				257	The function tosa_reference_check_fp() defines the error range permitted by a given number of units of last place in this specification.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	258
				259	NOTE: The error criteria in this section are at an early draft stage and are likely to change during conformance test development.
				260
				261	The following criteria apply to all operations:
				262
				263	* If any input is a NaN and the result is floating-point then the result must be a NaN
				264	* If any input is a NaN and the operation is a comparison (greater, greater-equal, equal) then the result must be false
				265	* if any input is a NaN and the operation is conversion to an integer or boolean then the result is unpredictable
				266
				267	[cols="1,3"]
				268	\|===
				269	\| Operation \| Accuracy bound
				270
Eric Kunze	0ae7fd6	2023-09-26 17:29:43 -0700	[diff] [blame]	271	\| <<ARGMAX>>, <<MAX_POOL2D>>, <<CLAMP>>, <<MAXIMUM>>, <<MINIMUM>>, <<ABS>>, <<NEGATE>>, <<SELECT>>, <<REDUCE_MAX>>, <<REDUCE_MIN>>, <<CONST>>, <<IDENTITY>>
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	272	\| Non NaN results must be exact.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	273
				274	\| <<EQUAL>>, <<GREATER>>, <<GREATER_EQUAL>>
				275	\| The result must be exact with: +
				276	(1) The sign of the zero is ignored +
				277	(2) Infinities of the same sign compare as equal
				278
				279	\| <<CONV2D>>, <<CONV3D>>, <<DEPTHWISE_CONV2D>>, <<FULLY_CONNECTED>>, <<MATMUL>>, <<TRANSPOSE_CONV2D>>
				280	\| Each output can be expressed as a dot product of two input vectors. +
				281	The dot product must meet the <<Dot product accuracy requirements>>
				282
				283	\| <<FFT2D>>, <<RFFT2D>>
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	284	\| Each output can be expressed as a dot product of an input vector with a constant coefficient vector. +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	285	The dot product must meet the <<Dot product accuracy requirements>>
				286
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	287	\| <<ADD>>, <<MUL>>, <<SUB>>, <<CEIL>>, <<FLOOR>>
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	288	\| Floating-point result overflows must be set to infinity of the correct sign. +
				289	Floating-point result underflows must be set to zero of the correct sign. +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	290	Addition of infinites of different signs must produce a NaN. +
				291	Subtraction of infinities of the same sign must produce a NaN. +
				292	Multiplication of an infinity by a zero must produce a NaN. +
Dominic Symes	c237b7e	2023-09-20 15:08:53 +0100	[diff] [blame]	293	Otherwise the result must be within 0.5 ulp of the mathematical result.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	294
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	295	\| <<CAST>>
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	296	\| Result overflows when converting between fp32_t, bf16_t and fp16_t must be set to infinity of the correct sign. +
Eric Kunze	aa162aa	2024-04-12 16:19:55 -0700	[diff] [blame]	297	fp8e4m3_t and fp8e5m2_t must use the non-saturating mode defined in <<OCP-OFP8,OCP-OFP8>> when converting from the wider floating-point types. +
				298	If saturation of the fp8 types is desired, a <<CLAMP>> operation with the appropriate parameters should be used before the cast. +
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	299	Floating-point result underflows must be set to zero of the correct sign. +
				300	Cast from floating-point to integer result overflows must be saturated. +
Dominic Symes	c237b7e	2023-09-20 15:08:53 +0100	[diff] [blame]	301	Cast from floating-point to integer must be rounded using round to nearest, ties to even, rounding mode. +
				302	Otherwise cast to floating-point must be within 0.5 ulp of the mathematical result.
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	303
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	304	\| <<RECIPROCAL>>
				305	\| If the input is a zero or the result overlows the output must be an infinity of the same sign. +
				306	If the input is an infinty or the result underflows the output must be a zero of the same sign. +
				307	Otherwise:the result must be within 1 ulp of the mathematical result.
				308
				309	\| <<RSQRT>>
				310	\| If the input is less than zero the result must be a NaN. +
				311	Otherwise if the input is a zero the output must be an infinity of the same sign. +
Dominic Symes	a46cf1d	2023-11-07 11:46:16 +0000	[diff] [blame]	312	Otherwise the result must be within 2 ulp of the mathematical result.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	313
Dominic Symes	2bc6c57	2023-11-30 10:56:33 +0000	[diff] [blame]	314	\| <<LOG>>, <<ERF>>
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	315	\| If the input to LOG is less than zero then the result must be a NaN. +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	316	If the result overflows the output must be an infinity of the correct sign. +
				317	If the result underflows the output must be a zero of the correct sign. +
				318	Otherwise the result must be within 5 ulp of the mathematical result.
				319
Dominic Symes	f791b44	2023-10-30 14:26:11 +0000	[diff] [blame]	320	\| <<EXP>>
				321	\| Let `x` be an input element and `out_imp` the implementation output of `exp(x)`. +
				322	Let `out_ref` be the result of the fp64_t reference implementation of `exp(x)`. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	323	Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, (1+abs(x)), 0, 1)` +
Dominic Symes	a46cf1d	2023-11-07 11:46:16 +0000	[diff] [blame]	324	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symes	f791b44	2023-10-30 14:26:11 +0000	[diff] [blame]	325
				326	\| <<POW>>
Eric Kunze	18acfe3	2024-01-03 10:55:00 -0800	[diff] [blame]	327	\| Let `x`, `y` be input elements from `input1` and `input2` respectively. +
				328	Let `out_imp` be the implementation output of `pow(x,y)`. +
				329	If `x` is less than zero and `y` is non-integral then the result must be a NaN. +
Dominic Symes	f791b44	2023-10-30 14:26:11 +0000	[diff] [blame]	330	Let `out_ref` be the result of the fp64_t reference implementation of `pow(x,y)`. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	331	Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(log(abs(x))*y)), 0, 1)` +
Dominic Symes	a46cf1d	2023-11-07 11:46:16 +0000	[diff] [blame]	332	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symes	f791b44	2023-10-30 14:26:11 +0000	[diff] [blame]	333
Dominic Symes	8754ec2	2023-12-08 17:45:31 +0000	[diff] [blame]	334	\| <<SIGMOID>>
Dominic Symes	2bc6c57	2023-11-30 10:56:33 +0000	[diff] [blame]	335	\| Let `x` be an input element and `out_imp` the implementation output. +
				336	Let `out_ref` be the result of the fp64_t reference implementation. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	337	Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 2 * (1+abs(x)), 0, 1)` +
Dominic Symes	2bc6c57	2023-11-30 10:56:33 +0000	[diff] [blame]	338	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
				339
Dominic Symes	8754ec2	2023-12-08 17:45:31 +0000	[diff] [blame]	340	\| <<TANH>>
				341	\| Let `x` be an input element and `out_imp` the implementation output. +
				342	Let `out_ref` be the result of the fp64_t reference implementation. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	343	Let `err_bnd = calcAbsErrorBound<in_out_t>(out_ref, 4 * (1+abs(x)), 0.5, 1)` +
Dominic Symes	8754ec2	2023-12-08 17:45:31 +0000	[diff] [blame]	344	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
				345
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	346	\| <<REDUCE_SUM>>
				347	\| Each output can be expressed as a dot product of an input vector with a vector of ones. +
				348	This dot product must meet the <<Dot product accuracy requirements>>
				349
				350	\| <<AVG_POOL2D>>
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	351	\| Each output can be expressed as a dot product of an input vector with a vector with elements 1/KS where KS is the kernel size. +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	352	This dot product must meet the <<Dot product accuracy requirements>>
				353
				354	\| <<REDUCE_PRODUCT>>
				355	\| Result overflows must be set to an infinity of the correct sign. +
				356	Result underflows must be set to a zero of the correct sign. +
Dominic Symes	83e79b5	2024-01-08 10:45:47 +0000	[diff] [blame]	357	Let n be number of elements in the product, out_imp the implementation result, and out_ref the result of the fp64_t reference implementation. +
				358	Let `err_bnd = abs(out_ref) * (pow(1 + pow(2, -normal_frac<in_out_t> - 1), n) - 1)` +
				359	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	360
Eric Kunze	1f05883	2024-02-13 16:51:17 -0800	[diff] [blame]	361	\| <<COS>>
				362	\| Let `x` be an input element and `out_imp` the implementation output of `cos(x)`. +
				363	Let `out_ref` be the result of the fp64_t reference implementation of `cos(x)`. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	364	Let `err_bnd = calcAbsErrorBound<in_out_t>(x, 1+abs(x), 0, 2)` +
Eric Kunze	1f05883	2024-02-13 16:51:17 -0800	[diff] [blame]	365	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
				366
				367	\| <<SIN>>
				368	\| Let `x` be an input element and `out_imp` the implementation output of `sin(x)`. +
				369	Let `out_ref` be the result of the fp64_t reference implementation of `sin(x)`. +
Eric Kunze	0e121c0	2024-04-10 15:26:55 -0700	[diff] [blame]	370	Let `err_bnd = calcAbsErrorBound<in_out_t>(x, abs(x), 0, 2)` +
Eric Kunze	1f05883	2024-02-13 16:51:17 -0800	[diff] [blame]	371	Then `tosa_reference_check_fp_bnd<in_out_t>(out_imp, out_ref, err_bnd)` must be true
				372
Eric Kunze	2266c7a	2023-10-27 14:38:56 -0700	[diff] [blame]	373	\| <<RESIZE>>
				374	\| The result corresponds to a sequence of floating-point calculations. +
				375	The allowable error bound for the result of a resize is based on the maximum value of an element in the input tensor. +
				376	Let `out_imp` be the implementation output. +
				377	Let `out_ref` be the result of the fp64_t reference implementation. +
				378	Let `err_bnd = max(abs(input)) * 0.006`. +
				379	Then `tosa_reference_check_fp_bnd<out_t>(out_imp, out_ref, err_bnd)` must be true.
				380
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	381	\|===
				382
Dominic Symes	f791b44	2023-10-30 14:26:11 +0000	[diff] [blame]	383	===== Operator sequence precision requirement
				384
				385	Precision criteria are specified for a single operator.
				386
				387	An implementation M of a sequence of n TOSA operators, A[0] to A[n-1] is said to
				388	be compliant if M gives the same result as a sequence of implementations
				389	M[0] to M[n-1] such that:
				390
				391	* Each M[k] implements A[k] with same or higher precision datatypes
				392	* Each M[k] meets the accuracy defined in this specification for A[k] where the M[k] output is converted to A[k] output precision using round to nearest
				393
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	394	===== Dot product accuracy requirements
				395
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	396	This section assumes an operation acting on tensors named 'input', 'weight' and optionally 'bias'.
				397	Each output tensor element can be expressed as a dot product of elements between the 'input' and 'weight' tensors with optional bias addition.
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	398	The dot product has length KS, the kernel size.
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	399	If the operation does not specify a bias then 'bias' is taken to be zero in this section.
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	400	Note: KS is defined for each relevant operator in the appendix section <<Main Inference operator test data>>.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	401
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	402	In other words, each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`:
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	403
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	404	`out = in[0] * w[0] + in[1] * w[1] + ... + in[KS-1] * w[KS-1] + b`
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	405
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	406	The positions of `in[k]`, `w[k]`, `b` in the input, weight and bias tensors depends on the operation being performed.
				407	This may be, for example, a convolution.
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	408
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	409	This section defines the accuracy required for these operations.
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	410	In this section:
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	411
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	412	* "fp64 arithmetic" refers to double-precision floating-point arithmetic defined by <<IEEE-754,IEEE-754>>
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	413	* `operation_fp64()` is an fp64 reference implementation of the operation
				414	* `operation_imp()` is the implementation under test
				415	* `local_bound` is defined as follows:
				416	** For operations with a local_bound attribute it is the value of the optional attribute, with default value of false
				417	** For operations that do not have a local_bound attribute the value is true
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	418
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	419	The checks described in the following code must pass for the following data sets:
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	420
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	421	* Data sets defined for the operation in Appendix A <<Main Inference operator test data>>.
				422	* Data sets that have at least MIN_DOT_PRODUCT different output values. For these data sets we take S=-1.
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	423
				424	[source,c++]
				425	----
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	426	output_ref = operation_fp64(input, weight, bias);
				427	output_imp = operation_imp (input, weight, bias);
				428	input_abs = abs(input); // Element-wise absolute
				429	weight_abs = abs(weight); // Element-wise absolute
				430	bias_abs = abs(bias); // Element-wise absolute
				431	if (!local_bound) {
				432	input_abs_max = max_value(input_abs); // maximum over all elements
				433	for_each(index in shape(input_abs) {
				434	input_abs[index] = input_abs_max; // set all entries to global maximum
				435	}
				436	}
				437	output_bnd = operation_fp64(input_abs, weight_abs, bias_abs);
				438
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	439	size_t T = tensor_size(output_shape) // number dot product results
Eric Kunze	0afe61f	2024-02-14 16:33:31 -0800	[diff] [blame]	440	size ksb = ceil(KS / exp2(normal_frac<acc_t>() - normal_frac<out_t>())) + ((max_value(bias_abs) > 0) ? 1 : 0);
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	441	fp64_t out_err_sum = 0.0;
				442	fp64_t out_err_sumsq = 0.0;
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	443	for_each(index in output_shape) {
				444	fp64_t out_bnd = tensor_read<fp64_t>(output_bnd, output_shape, index);
				445	fp64_t out_ref = tensor_read<fp64_t>(output_ref, output_shape, index);
				446	acc_t out_imp = tensor_read<acc_t> (output_imp, output_shape, index);
				447	fp64_t out_err;
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	448	if ((acc_t)out_bnd == infinity) {
				449	// dot product can overflow and there is no accuracy limit
				450	out_err = 0.0;
				451	} else if (out_bnd == 0.0) {
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	452	REQUIRE(out_ref == 0.0 && out_imp == 0.0);
				453	out_err = 0.0;
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	454	} else { // 0.0 < out_bnd < infinity
Eric Kunze	0afe61f	2024-02-14 16:33:31 -0800	[diff] [blame]	455	fp64_t out_err_bnd = max(out_bnd * exp2(-1-normal_frac<out_t>()), normal_min<out_t>());
Dominic Symes	b203512	2023-09-01 11:41:08 +0100	[diff] [blame]	456	out_err = (static_cast<fp64_t>(out_imp) - out_ref) / out_err_bnd;
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	457	REQUIRE(abs(out_err) <= ksb);
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	458	}
				459	out_err_sum += out_err;
				460	out_err_sumsq += out_err * out_err;
				461	}
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	462	if (input and weights are data set S with 3 <= S <= 5) {
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	463	// check output error bias magnitude for data sets S which are not positive biased
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	464	REQUIRE(abs(out_err_sum) <= 2sqrt(ksbT));
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	465	}
				466	// check output error variance magnitude
Dominic Symes	b5b0678	2023-07-27 11:50:57 +0100	[diff] [blame]	467	REQUIRE(out_err_sumsq <= 0.4ksbT)
Dominic Symes	5b936a3	2023-03-01 11:34:40 +0000	[diff] [blame]	468	----
Dominic Symes	ca2a854	2021-03-19 13:56:27 +0000	[diff] [blame]	469
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	470	=== Tensor Definitions
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	471
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	472	==== Tensors
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	473
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	474	Tensors are multidimensional arrays of data.
				475	Tensors have metadata associated with them that describe characteristics of the tensor, including:
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	476
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	477	* Data Type
				478	* Shape
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	479
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	480	The number of dimensions in a shape is called the rank.
				481	A tensor with rank equal to zero is permitted.
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	482	In that case, the tensor has a single entry and is also known as a scalar.
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	483	A tensor shape is an array of integers of size equal to the rank of the tensor.
				484	Each element in the tensor shape describes the number of elements in the dimension.
				485	The tensor shape in each dimension must be greater than or equal to 1.
				486	For tensor access information, see <<Tensor Access Helpers>>.
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	487
Eric Kunze	526f6c7	2024-01-12 17:18:42 -0800	[diff] [blame]	488	The shape of a tensor of non-zero rank is a special type shape_t.
				489	shape_t is a one-dimensional list with the size equal to the rank of the original tensor.
				490	The components of a shape_t are of type size_t.
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	491
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	492	In this version of the specification, shape_t values must be resolvable to constants at backend compile time.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	493
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	494	==== Tensor size limit
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	495
Dominic Symes	bc72ba8	2023-04-24 17:08:02 +0100	[diff] [blame]	496	The tensor overall size is limited by the data type size_t.
Eric Kunze	526f6c7	2024-01-12 17:18:42 -0800	[diff] [blame]	497	This type must be able to hold integers in the range 0 to (1 << (MAX_LOG2_SIZE + 1)) - 1 where MAX_LOG2_SIZE is defined in <<Levels>>.
				498	For each tensor, the number of tensor elements multiplied by the element size in bytes (which is taken to be 1 for elements smaller than a 8-bit) must be less than or equal to (1 << (MAX_LOG2_SIZE + 1)) - 1.
Dominic Symes	bc72ba8	2023-04-24 17:08:02 +0100	[diff] [blame]	499
Eric Kunze	526f6c7	2024-01-12 17:18:42 -0800	[diff] [blame]	500	The size of tensors along each of their dimensions is limited by the data type size_t.
				501
				502	This means that the maximum size of a tensor along each dimension is (1 << MAX_LOG2_SIZE) - 1 and therefore the maximum coordinate value is (1 << MAX_LOG2_SIZE) - 2.
Dominic Symes	0205d99	2022-10-07 15:03:01 +0100	[diff] [blame]	503	Indices used to access tensors must be non-negative.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	504
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	505
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	506	==== Data Layouts
				507
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	508	The following data layouts are supported in TOSA.
				509	TOSA operations are defined in terms of a linear packed tensor layout.
				510	In a linear packed layout a rank r tensor has elements of dimension (r-1) consecutive.
				511	The next to increment is dimension (r-2) and so on.
				512	For a specification of this layout see the tensor read and write functions in section <<Tensor Access Helpers>>.
				513
				514	An implementation of TOSA can choose a different tensor memory layout provided that the operation behavior is maintained.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	515
				516	.Data Layouts
				517	[cols="1,4,4"]
				518	\|===
				519	\|Name\|Description of dimensions\|Usage
				520
				521	\|NHWC\|Batch, Height, Width, Channels\|Feature maps
				522	\|NDHWC\|Batch, Depth, Height, Width, Channels\|Feature maps for 3D convolution
				523	\|OHWI\|Output channels, Filter Height, Filter Width, Input channels\|Weights
				524	\|HWIM\|Filter Height, Filter Width, Input channels, Channel Multiplier\|Weights for depthwise convolutions
				525	\|DOHWI\|Depth, Output Channels, Filter Height, Filter Width, Input Channels\|Weights for 3D convolution
				526	\|===
				527
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	528	==== Broadcasting
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	529
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	530	In operations where broadcasting is supported, an input shape dimension can be broadcast to an output shape dimension if the input shape dimension is 1.
				531	TOSA broadcast requires the rank of both tensors to be the same.
				532	A RESHAPE can be done to create a compatible tensor with appropriate dimensions of size 1.
				533	To map indexes in an output tensor to that of an input tensor, see <<Broadcast Helper>>.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	534
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	535	==== Supported Number Formats
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	536
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	537	The following number formats are defined in TOSA.
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	538	The number formats supported by a given operator are listed in its table of supported types.
Eric Kunze	6dd3410	2024-02-25 22:24:52 -0800	[diff] [blame]	539	A TOSA implementation must support the number formats listed in the supported data types for operators contained in that profile.
				540	Number formats not required for any operators in a profile do not need to be implemented.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	541
				542	.Number formats
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	543	[cols="1,1,1,5"]
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	544	\|===
				545	\|Format\|Minimum\|Maximum\|Description
				546
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	547	\|bool_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	548	\| -
				549	\| -
Kevin Petit	f9fcb61	2024-01-23 19:09:29 +0000	[diff] [blame]	550	\|Boolean value that is either `true` or `false`. Size implementation defined. The TOSA reference model implements this as int8_t with 0 for `false` and 1 for `true`. All non-zero values are accepted on input as `true`.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	551
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	552	\|i4_t
				553	\| -
				554	\| -
				555	\|Signless 4-bit integer type. Will be interpreted as int4_t by all operators
				556
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	557	\|int4_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	558	\| -7
				559	\| +7
Eric Kunze	eef012e	2022-05-13 14:54:06 -0700	[diff] [blame]	560	\|Signed 4-bit two's-complement value. Excludes -8 to maintain a symmetric about zero range for weights.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	561
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	562	\|i8_t
				563	\| -
				564	\| -
				565	\|Signless 8-bit integer value. Will be interpreted as int8_t unless otherwise specified by an operator.
				566
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	567	\|int8_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	568	\| -128
				569	\| +127
Eric Kunze	eef012e	2022-05-13 14:54:06 -0700	[diff] [blame]	570	\|Signed 8-bit two's-complement value.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	571
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	572	\|uint8_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	573	\| 0
				574	\| 255
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	575	\|Unsigned 8-bit integer value.
				576
				577	\|i16_t
				578	\| -
				579	\| -
				580	\|Signless 16-bit integer type. Will be interpreted as int16_t unless otherwise specified by an operator.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	581
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	582	\|int16_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	583	\| -32768
Eric Kunze	2dce0d0	2021-01-12 16:19:50 -0800	[diff] [blame]	584	\| +32767
Eric Kunze	eef012e	2022-05-13 14:54:06 -0700	[diff] [blame]	585	\|Signed 16-bit two's-complement value.
				586
				587	\|uint16_t
				588	\| 0
				589	\| 65535
				590	\|Unsigned 16-bit value.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	591
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	592	\|i32_t
				593	\| -
				594	\| -
				595	\|Signless 32-bit integer value. Will be interpreted as int32_t by all operators.
				596
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	597	\|int32_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	598	\| -(1<<31)
Eric Kunze	2dce0d0	2021-01-12 16:19:50 -0800	[diff] [blame]	599	\| (1<<31)-1
Eric Kunze	173fc16	2021-08-17 14:57:46 -0700	[diff] [blame]	600	\|Signed 32-bit two's-complement value.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	601
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	602	\|i48_t
				603	\| -
				604	\| -
Eric Kunze	2f3f4a2	2024-01-08 14:22:11 -0800	[diff] [blame]	605	\|Signless 48-bit integer value. Will be interpreted as int48_t by all operators.
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	606
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	607	\|int48_t
Eric Kunze	57e79c0	2020-11-03 11:23:09 -0800	[diff] [blame]	608	\| -(1<<47)
Eric Kunze	2dce0d0	2021-01-12 16:19:50 -0800	[diff] [blame]	609	\| (1<<47)-1
Eric Kunze	173fc16	2021-08-17 14:57:46 -0700	[diff] [blame]	610	\|Signed 48-bit two's-complement value.
Eric Kunze	57e79c0	2020-11-03 11:23:09 -0800	[diff] [blame]	611
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	612	\|fp8e4m3_t
				613	\| -448
				614	\| 448
				615	\| 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with four bits of exponent and three bits of mantissa. +
				616	Normal values must be supported. +
				617	Denormal values must be supported. +
				618	The NaN encoding must be supported. +
				619	Signed zero must be supported.
				620
				621	\|fp8e5m2_t
				622	\| -infinity
				623	\| +infinity
				624	\| 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with five bits of exponent and two bits of mantissa. +
				625	Normal values must be supported. +
				626	Denormal values must be supported. +
				627	Positive and negative infinity must be supported. +
				628	NaN encodings must be supported. +
				629	Signed zero must be supported.
				630
Eric Kunze	42229d0	2022-04-07 16:54:46 -0700	[diff] [blame]	631	\|fp16_t
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	632	\| -infinity
				633	\| +infinity
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	634	\| 16-bit half-precision floating-point defined by <<IEEE-754,IEEE-754>> . +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	635	Normal values must be supported. +
				636	Denormal values must either be supported or flushed to zero. +
				637	Positive and negative infinity must be supported. +
				638	At least one NaN encoding must be supported. +
				639	Signed zero must be supported.
Eric Kunze	42229d0	2022-04-07 16:54:46 -0700	[diff] [blame]	640
				641	\|bf16_t
				642	\| -infinity
				643	\| +infinity
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	644	\| 16-bit brain floating-point defined as bits [31:16] of the fp32_t format. +
				645	Normal values must be supported. +
				646	Denormal values must either be supported or flushed to zero. +
				647	Positive and negative infinity must be supported. +
				648	At least one NaN encoding must be supported. +
				649	Signed zero must be supported.
Eric Kunze	42229d0	2022-04-07 16:54:46 -0700	[diff] [blame]	650
				651	\|fp32_t
				652	\| -infinity
				653	\| +infinity
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	654	\| 32-bit single-precision floating-point defined by <<IEEE-754,IEEE-754>> . +
Eric Kunze	277a4f1	2023-05-12 17:50:19 -0700	[diff] [blame]	655	Normal values must be supported. +
				656	Denormal values must either be supported or flushed to zero. +
				657	Positive and negative infinity must be supported. +
				658	At least one NaN encoding must be supported. +
				659	Signed zero must be supported.
				660
				661	\|fp64_t
				662	\| -infinity
				663	\| + infinity
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	664	\| 64-bit double-precision floating-point defined by <<IEEE-754,IEEE-754>>. +
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	665	Normal values must be supported. +
				666	Denormal values must either be supported or flushed to zero. +
				667	Positive and negative infinity must be supported. +
				668	At least one NaN encoding must be supported. +
				669	Signed zero must be supported.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	670	\|===
				671
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	672	Note: In this specification minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point).
				673	The minimum and maximum values for each type is given in the preceeding table.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	674
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	675	Note: Integer number formats smaller than 8 bits may be used provided that the numerical result is the same as using a sequence of 8-bit TOSA operations.
				676	For example, a convolution with low precision data must equal that of running the convolution at 8 bits and then clipping the result to the peritted output range.
				677	This ensures that a Base Inference profile TOSA implementation can calculate the same result.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	678
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	679	=== Integer Behavior
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	680
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	681	TOSA integer inputs and outputs are specified by signless values with the given number of bits.
				682	Unless otherwise specified, these values will be interpreted as signed twos-complement.
				683	The pseudocode will use int_t to indicate use as a signed value and uint_t to indicate use as an unsigned value.
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	684	If overflow occurs doing integer calculation, the result is unpredictable, as indicated by the REQUIRE checks in the pseudocode for the operators.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	685
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	686	Unsigned 8 and 16-bit values are only allowed in the RESCALE operation, to allow for compatibility with networks which expect unsigned 8-bit or 16-bit tensors for input and output.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	687
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	688	==== Quantization
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	689
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	690	Machine Learning frameworks may represent tensors with a quantized implementation, using integer values to represent the original floating-point numbers.
				691	TOSA integer operations do not perform any implicit scaling to represent quantized values.
				692	Required zero point values are passed to the operator as necessary, and will be processed according to the pseudocode for each operator.
Eric Kunze	c949f8a	2021-09-16 14:51:26 -0700	[diff] [blame]	693
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	694	To convert a network containing quantized tensors to TOSA, generate explicit RESCALE operators for any change of quantization scaling.
				695	This reduces quantized operations to purely integer operations.
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	696
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	697	As an example, an ADD between two quantized tensors requires the integer values represent the same range.
Kevin Petit	5333c25	2023-05-16 09:08:48 +0100	[diff] [blame]	698	The scale arguments for RESCALE can be calculated to ensure that the resulting tensors represent the same range.
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	699	Then the ADD is performed, and a RESCALE can be used to ensure that the result is scaled properly.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	700
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	701	RESCALE provides support for per-tensor and per-channel scaling values to ensure compatibility with a range of possible quantization implementations.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	702
Eric Kunze	c949f8a	2021-09-16 14:51:26 -0700	[diff] [blame]	703
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	704
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	705	==== Precision scaling
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	706
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	707	TOSA uses the RESCALE operation to scale between values with differing precision.
				708	The RESCALE operator is defined using an integer multiply, add, and shift.
				709	This guarantees that all TOSA implementations will return the same result for a RESCALE, including those with no support for floating-point numbers.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	710
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	711	This TOSA specification supports two precisions of multiplier: 16-bit and 32-bit.
				712	The 32-bit multiplier version supports two rounding modes to enable simpler lowering of existing frameworks that use two stage rounding.
				713	All arithmetic is designed so that it does not overflow a 64-bit accumulator and that the final result fits in 32 bits.
				714	In particular a 48-bit value can only be scaled with the 16-bit multiplier.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	715
Dominic Symes	3cb7535	2022-01-24 11:18:05 +0000	[diff] [blame]	716	The apply_scale functions provide a scaling of approximately (multiplier * 2^-shift^).
				717	The shift and value range is limited to allow a variety of implementations.
				718	The limit of 62 on shift allows the shift to be decomposed as two right shifts of 31.
Eric Kunze	ce6e02c	2022-03-11 15:12:38 -0800	[diff] [blame]	719	The limit on value allows implementations that left shift the value before the multiply in the case of shifts of 32 or less.
Dominic Symes	3cb7535	2022-01-24 11:18:05 +0000	[diff] [blame]	720	For example, in the case shift=30 an implementation of the form ((value\<<2) * multiplier + round)>>32 can be used.
				721	A scaling range of 2^+12^ down to 2^-32^ is supported for both functions with a normalized multiplier.
				722
				723	For example, in typical usage a scaling of m*2^-n^ where m is a fraction in the
				724	range 1.0 \<= m < 2.0 can be represented using multiplier=(1<<30)*m, shift=(30+n) for
				725	apply_scale_32() and multiplier=(1<<14)*m, shift=(14+n) for apply_scale_16().
				726	The values to achieve a scaling of 1.0 are shift=30, multiplier=1<<30 for apply_scale_32 and shift=14, multiplier=1<<14 for apply_scale_16.
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	727
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	728	[source,c++]
				729	----
Kevin Petit	98b3e33	2023-05-16 09:13:50 +0100	[diff] [blame]	730	int32_t apply_scale_32(int32_t value, int32_t multiplier, int8_t shift, bool_t double_round=false) {
Eric Kunze	a910153	2021-06-17 18:01:09 -0700	[diff] [blame]	731	REQUIRE(multiplier >= 0);
				732	REQUIRE(2 <= shift && shift <= 62);
Dominic Symes	830b43b	2023-05-09 10:14:49 +0100	[diff] [blame]	733	REQUIRE(value >= (-1 << (shift - 1)) && value < (1 << (shift - 1)));
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	734	int64_t round = 1 << (shift - 1);
				735	if (double_round) {
				736	if (shift > 31 && value >= 0) round += 1<<30;
				737	if (shift > 31 && value < 0) round -= 1<<30;
				738	}
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	739	int64_t result = static_cast<int64_t>(value) * multiplier + round;
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	740	result = result >> shift;
Dominic Symes	3cb7535	2022-01-24 11:18:05 +0000	[diff] [blame]	741	// result will fit a 32-bit range due to the REQUIRE on value
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	742	return static_cast<int32_t>(result);
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	743	}
				744
Kevin Petit	98b3e33	2023-05-16 09:13:50 +0100	[diff] [blame]	745	int32_t apply_scale_16(int48_t value, int16_t multipler, int8_t shift) {
Eric Kunze	a910153	2021-06-17 18:01:09 -0700	[diff] [blame]	746	REQUIRE(multiplier >= 0);
				747	REQUIRE(2 <= shift && shift <= 62);
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	748	int64_t round = (1 << (shift - 1));
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	749	int64_t result = static_cast<int64_t>(value) * multiplier + round;
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	750	result = result >> shift;
Eric Kunze	a910153	2021-06-17 18:01:09 -0700	[diff] [blame]	751	REQUIRE(result >= minimum<int32_t> && result <= maximum<int32_t>);
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	752	return static_cast<int32_t>(result);
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	753	}
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	754	----
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	755
				756	In some functions, the multiplier and shift are combined into a scale_t structure:
				757
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	758	[source,c++]
				759	----
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	760	typedef struct {
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	761	int32_t multiplier;
Kevin Petit	98b3e33	2023-05-16 09:13:50 +0100	[diff] [blame]	762	int8_t shift;
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	763	} scale_t;
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	764	----
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	765
				766	In places where a divide is required, we also use the function below to calculate an appropriate scaling value.
				767
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	768	[source,c++]
				769	----
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	770	scale_t reciprocal_scale(uint32_t value) {
Eric Kunze	a910153	2021-06-17 18:01:09 -0700	[diff] [blame]	771	REQUIRE(value > 0);
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	772	scale_t scale;
Dominic Symes	cb6c6b3	2022-04-29 16:15:56 +0100	[diff] [blame]	773	int32_t k = 32 - count_leading_zeros(value - 1); // (1 << k) / 2 < value <= (1 << k)
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	774	int64_t numerator = ((1 << 30) + 1) << k;
				775	scale.multiplier = numerator / value; // (1 << 30) <= multiplier < (1 << 31)
				776	scale.shift = 30 + k;
				777	return scale;
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	778	}
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	779	----
Eric Kunze	3309a53	2020-10-01 18:50:46 -0700	[diff] [blame]	780
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	781	==== Integer Convolutions
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	782
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	783	For the convolution operators, the input is not required to be scaled.
				784	The integer versions of the convolution operators will subtract the zero point from the integer values as defined for each operator.
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	785	The convolution produces an accumulator output of type int32_t or int48_t.
				786	This accumulator output is then scaled to the final output range using the RESCALE operator.
				787	The scale applied in the RESCALE operator should be set to multiplier and shift values such that: multiplier * 2^-shift^ = (input scale * weight scale) / output_scale.
				788	Here, input_scale, weight_scale and output_scale are the conversion factors from integer to floating-point for the input, weight and output tensor values respectively.
				789	If per-channel scaling is needed then the per-channel option of the RESCALE operation should be used.
				790
Eric Kunze	f9e5ba9	2022-05-26 16:38:40 -0700	[diff] [blame]	791	==== Integer Elementwise Operators
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	792
				793	When two quantized tensors are used in an operation, they must represent the same numeric range for the result to be valid.
				794	In this case, TOSA expects that RESCALE operators will be used as necessary to generate 32-bit integer values in a common range.
				795	There are many valid choices for scale factors and options for the common range.
				796	TOSA does not impose a requirement on which scale factors and range should be used.
				797	Compilers generating TOSA sequences should choose a range that allows the operation to be computed without overflow, while allowing the highest possible accuracy of the output.
				798
				799	==== General Unary Functions
				800	General unary functions such as sigmoid(), tanh(), exp() for integer inputs are expressed using a lookup table and interpolation to enable efficient implementation.
				801	This also allows for other operations with the addition of user-supplied tables (the TABLE operation).
				802	All table lookups are based on the following reference lookup function that takes as input a table of 513 entries of 16 bits each.
				803
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	804	[source,c++]
				805	----
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	806	int32_t apply_lookup_s(int16_t *table, int32_t value)
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	807	{
Eric Kunze	fb0284e	2023-07-18 15:20:53 -0700	[diff] [blame]	808	int16_t clipped_value = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767));
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	809	int32_t index = (clipped_value + 32768) >> 7;
				810	int32_t fraction = clipped_value & 0x7f;
				811	int16_t base = table[index];
				812	int16_t next = table[index+1];
Dominic Symes	2ff79fe	2022-01-27 15:44:26 +0000	[diff] [blame]	813	int32_t slope = next - base;
				814	REQUIRE(slope >= minimum<int16_t> && slope <= maximum<int16_t>)
				815	int32_t return_value = (base << 7) + slope * fraction;
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	816	return return_value; // return interpolated value of 16 + 7 = 23 bits
				817	}
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	818	----
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	819
				820	Note that although the table lookup defined here has 16-bit precision, for 8-bit only operations an 8-bit table can be derived by applying the reference function to each of the possible 256 input values.
				821	The following code constructs a 513-entry table based on a reference function.
				822
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	823	[source,c++]
				824	----
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	825	void generate_lookup_table(int16_t table, int32_t (reference)(int32_t))
				826	{
				827	for (int i = -256; i <= 256; i++) {
				828	int32_t value = (*reference)(i);
Eric Kunze	6085883	2024-01-22 16:54:29 -0800	[diff] [blame]	829	table[i + 256] = static_cast<int16_t>(apply_clip_s<int32_t>(value, -32768, +32767));
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	830	}
				831	}
Eric Kunze	839830a	2021-03-11 15:38:22 -0800	[diff] [blame]	832	----
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	833
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	834	=== Other publications
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	835
Dominic Symes	c386a05	2023-01-20 16:09:31 +0000	[diff] [blame]	836	The following publications are referred to in this specification, or provide more information:
Eric Kunze	1e9ba65	2021-02-17 19:23:39 -0800	[diff] [blame]	837
Eric Kunze	74e2ceb	2023-10-20 15:58:55 -0700	[diff] [blame]	838	. [[IEEE-754]]IEEE Std 754-2008, _IEEE Standard for Floating-point Arithmetic_, August 2008.
				839	. [[OCP-OFP8]]Open Compute Project OCP 8-bit Floating Point Specification (OFP8) Revision 1.0