Blame - third-party/half/README.txt - ml/armnn

blob: 3a0960c1258c25c9c2ccd160e4b1ff4c43dd3778 [file] [log] [blame]

telsoa01	c577f2c	2018-08-31 09:22:23 +0100	[diff] [blame]	1	HALF-PRECISION FLOATING POINT LIBRARY (Version 1.12.0)
				2	------------------------------------------------------
				3
				4	This is a C++ header-only library to provide an IEEE 754 conformant 16-bit
				5	half-precision floating point type along with corresponding arithmetic
				6	operators, type conversions and common mathematical functions. It aims for both
				7	efficiency and ease of use, trying to accurately mimic the behaviour of the
				8	builtin floating point types at the best performance possible.
				9
				10
				11	INSTALLATION AND REQUIREMENTS
				12	-----------------------------
				13
				14	Comfortably enough, the library consists of just a single header file
				15	containing all the functionality, which can be directly included by your
				16	projects, without the neccessity to build anything or link to anything.
				17
				18	Whereas this library is fully C++98-compatible, it can profit from certain
				19	C++11 features. Support for those features is checked automatically at compile
				20	(or rather preprocessing) time, but can be explicitly enabled or disabled by
				21	defining the corresponding preprocessor symbols to either 1 or 0 yourself. This
				22	is useful when the automatic detection fails (for more exotic implementations)
				23	or when a feature should be explicitly disabled:
				24
				25	- 'long long' integer type for mathematical functions returning 'long long'
				26	results (enabled for VC++ 2003 and newer, gcc and clang, overridable with
				27	'HALF_ENABLE_CPP11_LONG_LONG').
				28
				29	- Static assertions for extended compile-time checks (enabled for VC++ 2010,
				30	gcc 4.3, clang 2.9 and newer, overridable with 'HALF_ENABLE_CPP11_STATIC_ASSERT').
				31
				32	- Generalized constant expressions (enabled for VC++ 2015, gcc 4.6, clang 3.1
				33	and newer, overridable with 'HALF_ENABLE_CPP11_CONSTEXPR').
				34
				35	- noexcept exception specifications (enabled for VC++ 2015, gcc 4.6, clang 3.0
				36	and newer, overridable with 'HALF_ENABLE_CPP11_NOEXCEPT').
				37
				38	- User-defined literals for half-precision literals to work (enabled for
				39	VC++ 2015, gcc 4.7, clang 3.1 and newer, overridable with
				40	'HALF_ENABLE_CPP11_USER_LITERALS').
				41
				42	- Type traits and template meta-programming features from <type_traits>
				43	(enabled for VC++ 2010, libstdc++ 4.3, libc++ and newer, overridable with
				44	'HALF_ENABLE_CPP11_TYPE_TRAITS').
				45
				46	- Special integer types from <cstdint> (enabled for VC++ 2010, libstdc++ 4.3,
				47	libc++ and newer, overridable with 'HALF_ENABLE_CPP11_CSTDINT').
				48
				49	- Certain C++11 single-precision mathematical functions from <cmath> for
				50	an improved implementation of their half-precision counterparts to work
				51	(enabled for VC++ 2013, libstdc++ 4.3, libc++ and newer, overridable with
				52	'HALF_ENABLE_CPP11_CMATH').
				53
				54	- Hash functor 'std::hash' from <functional> (enabled for VC++ 2010,
				55	libstdc++ 4.3, libc++ and newer, overridable with 'HALF_ENABLE_CPP11_HASH').
				56
				57	The library has been tested successfully with Visual C++ 2005-2015, gcc 4.4-4.8
				58	and clang 3.1. Please contact me if you have any problems, suggestions or even
				59	just success testing it on other platforms.
				60
				61
				62	DOCUMENTATION
				63	-------------
				64
				65	Here follow some general words about the usage of the library and its
				66	implementation. For a complete documentation of its iterface look at the
				67	corresponding website http://half.sourceforge.net. You may also generate the
				68	complete developer documentation from the library's only include file's doxygen
				69	comments, but this is more relevant to developers rather than mere users (for
				70	reasons described below).
				71
				72	BASIC USAGE
				73
				74	To make use of the library just include its only header file half.hpp, which
				75	defines all half-precision functionality inside the 'half_float' namespace. The
				76	actual 16-bit half-precision data type is represented by the 'half' type. This
				77	type behaves like the builtin floating point types as much as possible,
				78	supporting the usual arithmetic, comparison and streaming operators, which
				79	makes its use pretty straight-forward:
				80
				81	using half_float::half;
				82	half a(3.4), b(5);
				83	half c = a * b;
				84	c += 3;
				85	if(c > a)
				86	std::cout << c << std::endl;
				87
				88	Additionally the 'half_float' namespace also defines half-precision versions
				89	for all mathematical functions of the C++ standard library, which can be used
				90	directly through ADL:
				91
				92	half a(-3.14159);
				93	half s = sin(abs(a));
				94	long l = lround(s);
				95
				96	You may also specify explicit half-precision literals, since the library
				97	provides a user-defined literal inside the 'half_float::literal' namespace,
				98	which you just need to import (assuming support for C++11 user-defined literals):
				99
				100	using namespace half_float::literal;
				101	half x = 1.0_h;
				102
				103	Furthermore the library provides proper specializations for
				104	'std::numeric_limits', defining various implementation properties, and
				105	'std::hash' for hashing half-precision numbers (assuming support for C++11
				106	'std::hash'). Similar to the corresponding preprocessor symbols from <cmath>
				107	the library also defines the 'HUGE_VALH' constant and maybe the 'FP_FAST_FMAH'
				108	symbol.
				109
				110	CONVERSIONS AND ROUNDING
				111
				112	The half is explicitly constructible/convertible from a single-precision float
				113	argument. Thus it is also explicitly constructible/convertible from any type
				114	implicitly convertible to float, but constructing it from types like double or
				115	int will involve the usual warnings arising when implicitly converting those to
				116	float because of the lost precision. On the one hand those warnings are
				117	intentional, because converting those types to half neccessarily also reduces
				118	precision. But on the other hand they are raised for explicit conversions from
				119	those types, when the user knows what he is doing. So if those warnings keep
				120	bugging you, then you won't get around first explicitly converting to float
				121	before converting to half, or use the 'half_cast' described below. In addition
				122	you can also directly assign float values to halfs.
				123
				124	In contrast to the float-to-half conversion, which reduces precision, the
				125	conversion from half to float (and thus to any other type implicitly
				126	convertible from float) is implicit, because all values represetable with
				127	half-precision are also representable with single-precision. This way the
				128	half-to-float conversion behaves similar to the builtin float-to-double
				129	conversion and all arithmetic expressions involving both half-precision and
				130	single-precision arguments will be of single-precision type. This way you can
				131	also directly use the mathematical functions of the C++ standard library,
				132	though in this case you will invoke the single-precision versions which will
				133	also return single-precision values, which is (even if maybe performing the
				134	exact same computation, see below) not as conceptually clean when working in a
				135	half-precision environment.
				136
				137	The default rounding mode for conversions from float to half uses truncation
				138	(round toward zero, but mapping overflows to infinity) for rounding values not
				139	representable exactly in half-precision. This is the fastest rounding possible
				140	and is usually sufficient. But by redefining the 'HALF_ROUND_STYLE'
				141	preprocessor symbol (before including half.hpp) this default can be overridden
				142	with one of the other standard rounding modes using their respective constants
				143	or the equivalent values of 'std::float_round_style' (it can even be
				144	synchronized with the underlying single-precision implementation by defining it
				145	to 'std::numeric_limits<float>::round_style'):
				146
				147	- 'std::round_indeterminate' or -1 for the fastest rounding (default).
				148
				149	- 'std::round_toward_zero' or 0 for rounding toward zero.
				150
				151	- std::round_to_nearest' or 1 for rounding to the nearest value.
				152
				153	- std::round_toward_infinity' or 2 for rounding toward positive infinity.
				154
				155	- std::round_toward_neg_infinity' or 3 for rounding toward negative infinity.
				156
				157	In addition to changing the overall default rounding mode one can also use the
				158	'half_cast'. This converts between half and any built-in arithmetic type using
				159	a configurable rounding mode (or the default rounding mode if none is
				160	specified). In addition to a configurable rounding mode, 'half_cast' has
				161	another big difference to a mere 'static_cast': Any conversions are performed
				162	directly using the given rounding mode, without any intermediate conversion
				163	to/from 'float'. This is especially relevant for conversions to integer types,
				164	which don't necessarily truncate anymore. But also for conversions from
				165	'double' or 'long double' this may produce more precise results than a
				166	pre-conversion to 'float' using the single-precision implementation's current
				167	rounding mode would.
				168
				169	half a = half_cast<half>(4.2);
				170	half b = half_cast<half,std::numeric_limits<float>::round_style>(4.2f);
				171	assert( half_cast<int, std::round_to_nearest>( 0.7_h ) == 1 );
				172	assert( half_cast<half,std::round_toward_zero>( 4097 ) == 4096.0_h );
				173	assert( half_cast<half,std::round_toward_infinity>( 4097 ) == 4100.0_h );
				174	assert( half_cast<half,std::round_toward_infinity>( std::numeric_limits<double>::min() ) > 0.0_h );
				175
				176	When using round to nearest (either as default or through 'half_cast') ties are
				177	by default resolved by rounding them away from zero (and thus equal to the
				178	behaviour of the 'round' function). But by redefining the
				179	'HALF_ROUND_TIES_TO_EVEN' preprocessor symbol to 1 (before including half.hpp)
				180	this default can be changed to the slightly slower but less biased and more
				181	IEEE-conformant behaviour of rounding half-way cases to the nearest even value.
				182
				183	#define HALF_ROUND_TIES_TO_EVEN 1
				184	#include <half.hpp>
				185	...
				186	assert( half_cast<int,std::round_to_nearest>(3.5_h)
				187	== half_cast<int,std::round_to_nearest>(4.5_h) );
				188
				189	IMPLEMENTATION
				190
				191	For performance reasons (and ease of implementation) many of the mathematical
				192	functions provided by the library as well as all arithmetic operations are
				193	actually carried out in single-precision under the hood, calling to the C++
				194	standard library implementations of those functions whenever appropriate,
				195	meaning the arguments are converted to floats and the result back to half. But
				196	to reduce the conversion overhead as much as possible any temporary values
				197	inside of lengthy expressions are kept in single-precision as long as possible,
				198	while still maintaining a strong half-precision type to the outside world. Only
				199	when finally assigning the value to a half or calling a function that works
				200	directly on halfs is the actual conversion done (or never, when further
				201	converting the result to float.
				202
				203	This approach has two implications. First of all you have to treat the
				204	library's documentation at http://half.sourceforge.net as a simplified version,
				205	describing the behaviour of the library as if implemented this way. The actual
				206	argument and return types of functions and operators may involve other internal
				207	types (feel free to generate the exact developer documentation from the Doxygen
				208	comments in the library's header file if you really need to). But nevertheless
				209	the behaviour is exactly like specified in the documentation. The other
				210	implication is, that in the presence of rounding errors or over-/underflows
				211	arithmetic expressions may produce different results when compared to
				212	converting to half-precision after each individual operation:
				213
				214	half a = std::numeric_limits<half>::max() * 2.0_h / 2.0_h; // a = MAX
				215	half b = half(std::numeric_limits<half>::max() * 2.0_h) / 2.0_h; // b = INF
				216	assert( a != b );
				217
				218	But this should only be a problem in very few cases. One last word has to be
				219	said when talking about performance. Even with its efforts in reducing
				220	conversion overhead as much as possible, the software half-precision
				221	implementation can most probably not beat the direct use of single-precision
				222	computations. Usually using actual float values for all computations and
				223	temproraries and using halfs only for storage is the recommended way. On the
				224	one hand this somehow makes the provided mathematical functions obsolete
				225	(especially in light of the implicit conversion from half to float), but
				226	nevertheless the goal of this library was to provide a complete and
				227	conceptually clean half-precision implementation, to which the standard
				228	mathematical functions belong, even if usually not needed.
				229
				230	IEEE CONFORMANCE
				231
				232	The half type uses the standard IEEE representation with 1 sign bit, 5 exponent
				233	bits and 10 mantissa bits (11 when counting the hidden bit). It supports all
				234	types of special values, like subnormal values, infinity and NaNs. But there
				235	are some limitations to the complete conformance to the IEEE 754 standard:
				236
				237	- The implementation does not differentiate between signalling and quiet
				238	NaNs, this means operations on halfs are not specified to trap on
				239	signalling NaNs (though they may, see last point).
				240
				241	- Though arithmetic operations are internally rounded to single-precision
				242	using the underlying single-precision implementation's current rounding
				243	mode, those values are then converted to half-precision using the default
				244	half-precision rounding mode (changed by defining 'HALF_ROUND_STYLE'
				245	accordingly). This mixture of rounding modes is also the reason why
				246	'std::numeric_limits<half>::round_style' may actually return
				247	'std::round_indeterminate' when half- and single-precision rounding modes
				248	don't match.
				249
				250	- Because of internal truncation it may also be that certain single-precision
				251	NaNs will be wrongly converted to half-precision infinity, though this is
				252	very unlikely to happen, since most single-precision implementations don't
				253	tend to only set the lowest bits of a NaN mantissa.
				254
				255	- The implementation does not provide any floating point exceptions, thus
				256	arithmetic operations or mathematical functions are not specified to invoke
				257	proper floating point exceptions. But due to many functions implemented in
				258	single-precision, those may still invoke floating point exceptions of the
				259	underlying single-precision implementation.
				260
				261	Some of those points could have been circumvented by controlling the floating
				262	point environment using <cfenv> or implementing a similar exception mechanism.
				263	But this would have required excessive runtime checks giving two high an impact
				264	on performance for something that is rarely ever needed. If you really need to
				265	rely on proper floating point exceptions, it is recommended to explicitly
				266	perform computations using the built-in floating point types to be on the safe
				267	side. In the same way, if you really need to rely on a particular rounding
				268	behaviour, it is recommended to either use single-precision computations and
				269	explicitly convert the result to half-precision using 'half_cast' and
				270	specifying the desired rounding mode, or synchronize the default half-precision
				271	rounding mode to the rounding mode of the single-precision implementation (most
				272	likely 'HALF_ROUND_STYLE=1', 'HALF_ROUND_TIES_TO_EVEN=1'). But this is really
				273	considered an expert-scenario that should be used only when necessary, since
				274	actually working with half-precision usually comes with a certain
				275	tolerance/ignorance of exactness considerations and proper rounding comes with
				276	a certain performance cost.
				277
				278
				279	CREDITS AND CONTACT
				280	-------------------
				281
				282	This library is developed by CHRISTIAN RAU and released under the MIT License
				283	(see LICENSE.txt). If you have any questions or problems with it, feel free to
				284	contact me at rauy@users.sourceforge.net.
				285
				286	Additional credit goes to JEROEN VAN DER ZIJP for his paper on "Fast Half Float
				287	Conversions", whose algorithms have been used in the library for converting
				288	between half-precision and single-precision values.