Blame - docs/02_tests.dox - ml/ComputeLibrary

blob: 0eb6cee4874b3a89f0771dffd4c07035b2aeab6c [file] [log] [blame]

Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	1	namespace arm_compute
				2	{
				3	namespace test
				4	{
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	5	/**
Anthony Barbier	79c6178	2017-06-23 11:48:24 +0100	[diff] [blame]	6	@page tests Validation and benchmarks tests
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	7
				8	@tableofcontents
				9
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	10	@section tests_overview Overview
				11
				12	Benchmark and validation tests are based on the same framework to setup and run
				13	the tests. In addition to running simple, self-contained test functions the
				14	framework supports fixtures and data test cases. The former allows to share
				15	common setup routines between various backends thus reducing the amount of
				16	duplicated code. The latter can be used to parameterize tests or fixtures with
				17	different inputs, e.g. different tensor shapes. One limitation is that
				18	tests/fixtures cannot be parameterized based on the data type if static type
				19	information is needed within the test (e.g. to validate the results).
				20
				21	@subsection tests_overview_structure Directory structure
				22
				23	.
				24	\|-- computer_vision <- Legacy tests. No new test must be added. <!-- FIXME: Remove before release -->
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	25	`-- tests <- Top level test directory. All files in here are shared among validation and benchmark.
				26	\|-- framework <- Underlying test framework.
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	27	\|-- CL \
				28	\|-- NEON -> Backend specific files with helper functions etc.
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	29	\|-- VX / <!-- FIXME: Remove VX -->
				30	\|-- benchmark <- Top level directory for the benchmarking files.
				31	\| \|-- fixtures <- Fixtures for benchmark tests.
				32	\| \|-- CL <- OpenCL backend test cases on a function level.
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	33	\| \| `-- SYSTEM <- OpenCL system tests, e.g. whole networks
				34	\| `-- NEON <- Same for NEON
				35	\| `-- SYSTEM
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	36	\|-- datasets <- Datasets for benchmark and validation tests.
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	37	\|-- main.cpp <- Main entry point for the tests. Currently shared between validation and benchmarking.
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	38	\|-- networks <- Network classes for system level tests.
				39	\|-- validation_old <- Old validation framework. No new tests must be added! <!-- FIXME: Remove before release -->
				40	\| \|-- dataset <- Old datasets for boost. Not to be used for new tests! <!-- FIXME: Remove before release -->
				41	\| \|-- model_objects <- Old helper files for system level validation. Not to be used for new tests! <!-- FIXME: Remove before release -->
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	42	\| \|-- CL \
				43	\| \|-- DEMO \
				44	\| \|-- NEON --> Backend specific test cases
				45	\| \|-- UNIT /
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	46	\| \|-- VX / <!-- FIXME: Remove VX -->
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	47	\| `-- system_tests -> System level tests
				48	\| \|-- CL
				49	\| `-- NEON
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	50	`-- validation -> Top level directory for validation files.
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	51	\|-- CPP -> C++ reference code
				52	\|-- CL \
				53	\|-- NEON -> Backend specific test cases
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	54	\|-- VX / <!-- FIXME: Remove VX -->
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	55	`-- fixtures -> Fixtures shared among all backends. Used to setup target function and tensors.
				56
				57	@subsection tests_overview_fixtures Fixtures
				58
				59	Fixtures can be used to share common setup, teardown or even run tasks among
				60	multiple test cases. For that purpose a fixture can define a `setup`,
				61	`teardown` and `run` method. Additionally the constructor and destructor might
				62	also be customized.
				63
				64	An instance of the fixture is created immediately before the actual test is
				65	executed. After construction the @ref framework::Fixture::setup method is called. Then the test
				66	function or the fixtures `run` method is invoked. After test execution the
				67	@ref framework::Fixture::teardown method is called and lastly the fixture is destructed.
				68
				69	@subsubsection tests_overview_fixtures_fixture Fixture
				70
				71	Fixtures for non-parameterized test are straightforward. The custom fixture
				72	class has to inherit from @ref framework::Fixture and choose to implement any of the
				73	`setup`, `teardown` or `run` methods. None of the methods takes any arguments
				74	or returns anything.
				75
				76	class CustomFixture : public framework::Fixture
				77	{
				78	void setup()
				79	{
				80	_ptr = malloc(4000);
				81	}
				82
				83	void run()
				84	{
				85	ARM_COMPUTE_ASSERT(_ptr != nullptr);
				86	}
				87
				88	void teardown()
				89	{
				90	free(_ptr);
				91	}
				92
				93	void *_ptr;
				94	};
				95
				96	@subsubsection tests_overview_fixtures_data_fixture Data fixture
				97
				98	The advantage of a parameterized fixture is that arguments can be passed to the setup method at runtime. To make this possible the setup method has to be a template with a type parameter for every argument (though the template parameter doesn't have to be used). All other methods remain the same.
				99
				100	class CustomFixture : public framework::Fixture
				101	{
				102	#ifdef ALTERNATIVE_DECLARATION
				103	template <typename ...>
				104	void setup(size_t size)
				105	{
				106	_ptr = malloc(size);
				107	}
				108	#else
				109	template <typename T>
				110	void setup(T size)
				111	{
				112	_ptr = malloc(size);
				113	}
				114	#endif
				115
				116	void run()
				117	{
				118	ARM_COMPUTE_ASSERT(_ptr != nullptr);
				119	}
				120
				121	void teardown()
				122	{
				123	free(_ptr);
				124	}
				125
				126	void *_ptr;
				127	};
				128
				129	@subsection tests_overview_test_cases Test cases
				130
				131	All following commands can be optionally prefixed with `EXPECTED_FAILURE_` or
				132	`DISABLED_`.
				133
				134	@subsubsection tests_overview_test_cases_test_case Test case
				135
				136	A simple test case function taking no inputs and having no (shared) state.
				137
				138	- First argument is the name of the test case (has to be unique within the
				139	enclosing test suite).
				140	- Second argument is the dataset mode in which the test will be active.
				141
				142
				143	TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT)
				144	{
				145	ARM_COMPUTE_ASSERT_EQUAL(1 + 1, 2);
				146	}
				147
				148	@subsubsection tests_overview_test_cases_fixture_fixture_test_case Fixture test case
				149
				150	A simple test case function taking no inputs that inherits from a fixture. The
				151	test case will have access to all public and protected members of the fixture.
				152	Only the setup and teardown methods of the fixture will be used. The body of
				153	this function will be used as test function.
				154
				155	- First argument is the name of the test case (has to be unique within the
				156	enclosing test suite).
				157	- Second argument is the class name of the fixture.
				158	- Third argument is the dataset mode in which the test will be active.
				159
				160
				161	class FixtureName : public framework::Fixture
				162	{
				163	public:
				164	void setup() override
				165	{
				166	_one = 1;
				167	}
				168
				169	protected:
				170	int _one;
				171	};
				172
				173	FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT)
				174	{
				175	ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
				176	}
				177
				178	@subsubsection tests_overview_test_cases_fixture_register_fixture_test_case Registering a fixture as test case
				179
				180	Allows to use a fixture directly as test case. Instead of defining a new test
				181	function the run method of the fixture will be executed.
				182
				183	- First argument is the name of the test case (has to be unique within the
				184	enclosing test suite).
				185	- Second argument is the class name of the fixture.
				186	- Third argument is the dataset mode in which the test will be active.
				187
				188
				189	class FixtureName : public framework::Fixture
				190	{
				191	public:
				192	void setup() override
				193	{
				194	_one = 1;
				195	}
				196
				197	void run() override
				198	{
				199	ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
				200	}
				201
				202	protected:
				203	int _one;
				204	};
				205
				206	REGISTER_FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT);
				207
				208
				209	@subsubsection tests_overview_test_cases_data_test_case Data test case
				210
				211	A parameterized test case function that has no (shared) state. The dataset will
				212	be used to generate versions of the test case with different inputs.
				213
				214	- First argument is the name of the test case (has to be unique within the
				215	enclosing test suite).
				216	- Second argument is the dataset mode in which the test will be active.
				217	- Third argument is the dataset.
				218	- Further arguments specify names of the arguments to the test function. The
				219	number must match the arity of the dataset.
				220
				221
				222	DATA_TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}), num)
				223	{
				224	ARM_COMPUTE_ASSERT(num < 4);
				225	}
				226
				227	@subsubsection tests_overview_test_cases_fixture_data_test_case Fixture data test case
				228
				229	A parameterized test case that inherits from a fixture. The test case will have
				230	access to all public and protected members of the fixture. Only the setup and
				231	teardown methods of the fixture will be used. The setup method of the fixture
				232	needs to be a template and has to accept inputs from the dataset as arguments.
				233	The body of this function will be used as test function. The dataset will be
				234	used to generate versions of the test case with different inputs.
				235
				236	- First argument is the name of the test case (has to be unique within the
				237	enclosing test suite).
				238	- Second argument is the class name of the fixture.
				239	- Third argument is the dataset mode in which the test will be active.
				240	- Fourth argument is the dataset.
				241
				242
				243	class FixtureName : public framework::Fixture
				244	{
				245	public:
				246	template <typename T>
				247	void setup(T num)
				248	{
				249	_num = num;
				250	}
				251
				252	protected:
				253	int _num;
				254	};
				255
				256	FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}))
				257	{
				258	ARM_COMPUTE_ASSERT(_num < 4);
				259	}
				260
				261	@subsubsection tests_overview_test_cases_register_fixture_data_test_case Registering a fixture as data test case
				262
				263	Allows to use a fixture directly as parameterized test case. Instead of
				264	defining a new test function the run method of the fixture will be executed.
				265	The setup method of the fixture needs to be a template and has to accept inputs
				266	from the dataset as arguments. The dataset will be used to generate versions of
				267	the test case with different inputs.
				268
				269	- First argument is the name of the test case (has to be unique within the
				270	enclosing test suite).
				271	- Second argument is the class name of the fixture.
				272	- Third argument is the dataset mode in which the test will be active.
				273	- Fourth argument is the dataset.
				274
				275
				276	class FixtureName : public framework::Fixture
				277	{
				278	public:
				279	template <typename T>
				280	void setup(T num)
				281	{
				282	_num = num;
				283	}
				284
				285	void run() override
				286	{
				287	ARM_COMPUTE_ASSERT(_num < 4);
				288	}
				289
				290	protected:
				291	int _num;
				292	};
				293
				294	REGISTER_FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}));
				295
				296	@section writing_tests Writing validation tests
				297
				298	Before starting a new test case have a look at the existing ones. They should
				299	provide a good overview how test cases are structured.
				300
Anthony Barbier	144d2ff	2017-09-29 10:46:08 +0100	[diff] [blame^]	301	- The C++ reference needs to be added to `tests/validation/CPP/`. The
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	302	reference function is typically a template parameterized by the underlying
				303	value type of the `SimpleTensor`. This makes it easy to specialise for
				304	different data types.
				305	- If all backends have a common interface it makes sense to share the setup
				306	code. This can be done by adding a fixture in
Anthony Barbier	144d2ff	2017-09-29 10:46:08 +0100	[diff] [blame^]	307	`tests/validation/fixtures/`. Inside of the `setup` method of a fixture
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	308	the tensors can be created and initialised and the function can be configured
				309	and run. The actual test will only have to validate the results. To be shared
				310	among multiple backends the fixture class is usually a template that accepts
				311	the specific types (data, tensor class, function class etc.) as parameters.
				312	- The actual test cases need to be added for each backend individually.
				313	Typically the will be multiple tests for different data types and for
				314	different execution modes, e.g. precommit and nightly.
				315
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	316	<!-- FIXME: Remove before release -->
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	317	@section building_test_dependencies Building dependencies
				318
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	319	@note Only required when tests from the old validation framework need to be run.
				320
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	321	The tests currently make use of Boost (Test and Program options) for
				322	validation. Below are instructions about how to build these 3rd party
				323	libraries.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	324
Anthony Barbier	79c6178	2017-06-23 11:48:24 +0100	[diff] [blame]	325	@note By default the build of the validation and benchmark tests is disabled, to enable it use `validation_tests=1` and `benchmark_tests=1`
				326
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	327	@subsection building_boost Building Boost
				328
				329	First follow the instructions from the Boost library on how to setup the Boost
				330	build system
				331	(http://www.boost.org/doc/libs/1_64_0/more/getting_started/index.html).
				332	Afterwards the required libraries can be build with:
				333
				334	./b2 --with-program_options --with-test link=static \
				335	define=BOOST_TEST_ALTERNATIVE_INIT_API
				336
				337	Additionally, depending on your environment, it might be necessary to specify
				338	the ```toolset=``` option to choose the right compiler. Moreover,
				339	```address-model=32``` can be used to force building for 32bit and
				340	```target-os=android``` must be specified to build for Android.
				341
				342	After executing the build command the libraries
				343	```libboost_program_options.a``` and ```libboost_unit_test_framework.a``` can
				344	be found in ```./stage/lib```.
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	345	<!-- FIXME: end remove -->
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	346
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	347	@section tests_running_tests Running tests
				348	@subsection tests_running_tests_benchmarking Benchmarking
				349	@subsubsection tests_running_tests_benchmarking_filter Filter tests
				350	All tests can be run by invoking
				351
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	352	./arm_compute_benchmark ./data
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	353
				354	where `./data` contains the assets needed by the tests.
				355
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	356	If only a subset of the tests has to be executed the `--filter` option takes a
				357	regular expression to select matching tests.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	358
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	359	./arm_compute_benchmark --filter='NEON/.*AlexNet' ./data
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	360
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	361	Additionally each test has a test id which can be used as a filter, too.
				362	However, the test id is not guaranteed to be stable when new tests are added.
				363	Only for a specific build the same the test will keep its id.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	364
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	365	./arm_compute_benchmark --filter-id=10 ./data
				366
				367	All available tests can be displayed with the `--list-tests` switch.
				368
				369	./arm_compute_benchmark --list-tests
				370
				371	More options can be found in the `--help` message.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	372
				373	@subsubsection tests_running_tests_benchmarking_runtime Runtime
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	374	By default every test is run once on a single thread. The number of iterations
				375	can be controlled via the `--iterations` option and the number of threads via
				376	`--threads`.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	377
Moritz Pflanzer	2b26b85	2017-07-21 10:09:30 +0100	[diff] [blame]	378	@subsubsection tests_running_tests_benchmarking_output Output
				379	By default the benchmarking results are printed in a human readable format on
				380	the command line. The colored output can be disabled via `--no-color-output`.
				381	As an alternative output format JSON is supported and can be selected via
				382	`--log-format=json`. To write the output to a file instead of stdout the
				383	`--log-file` option can be used.
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	384
Anthony Barbier	144d2ff	2017-09-29 10:46:08 +0100	[diff] [blame^]	385	@subsubsection tests_running_tests_benchmarking_mode Mode
				386	Tests contain different datasets of different sizes, some of which will take several hours to run.
				387	You can select which datasets to use by using the `--mode` option, we recommed you use `--mode=precommit` to start with.
				388
				389	@subsubsection tests_running_tests_benchmarking_instruments Instruments
				390	You can use the `--instruments` option to select one or more instruments to measure the execution time of the benchmark tests.
				391
				392	`PMU` will try to read the CPU PMU events from the kernel (They need to be enabled on your platform)
				393
				394	`MALI` will try to collect Mali hardware performance counters. (You need to have a recent enough Mali driver)
				395
				396	`WALL_CLOCK` will measure time using `gettimeofday`: this should work on all platforms.
				397
				398	You can pass a combinations of these instruments: `--instruments=PMU,MALI,WALL_CLOCK`
				399
				400	@note You need to make sure the instruments have been selected at compile time using the `pmu=1` or `mali=1` scons options.
				401
Moritz Pflanzer	a09de0c	2017-09-01 20:41:12 +0100	[diff] [blame]	402	<!-- FIXME: Remove before release and change above to benchmark and validation -->
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	403	@subsection tests_running_tests_validation Validation
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	404
				405	@note The new validation tests have the same interface as the benchmarking tests.
				406
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	407	@subsubsection tests_running_tests_validation_filter Filter tests
				408	All tests can be run by invoking
				409
				410	./arm_compute_validation -- ./data
				411
				412	where `./data` contains the assets needed by the tests.
				413
				414	As running all tests can take a lot of time the suite is split into "precommit" and "nightly" tests. The precommit tests will be fast to execute but still cover the most important features. In contrast the nightly tests offer more extensive coverage but take longer. The different subsets can be selected from the command line as follows:
				415
				416	./arm_compute_validation -t @precommit -- ./data
				417	./arm_compute_validation -t @nightly -- ./data
				418
				419	Additionally it is possible to select specific suites or tests:
				420
				421	./arm_compute_validation -t CL -- ./data
				422	./arm_compute_validation -t NEON/BitwiseAnd/RunSmall/_0 -- ./data
				423
				424	All available tests can be displayed with the `--list_content` switch.
				425
				426	./arm_compute_validation --list_content -- ./data
				427
				428	For a complete list of possible selectors please see: http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/runtime_config/test_unit_filtering.html
				429
				430	@subsubsection tests_running_tests_validation_verbosity Verbosity
				431	There are two separate flags to control the verbosity of the test output. `--report_level` controls the verbosity of the summary produced after all tests have been executed. `--log_level` controls the verbosity of the information generated during the execution of tests. All available settings can be found in the Boost documentation for [--report_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/report_level.html) and [--log_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/log_level.html), respectively.
Anthony Barbier	144d2ff	2017-09-29 10:46:08 +0100	[diff] [blame^]	432	<!-- FIXME: end remove -->
Anthony Barbier	6ff3b19	2017-09-04 18:44:23 +0100	[diff] [blame]	433	*/
Moritz Pflanzer	e3e7345	2017-08-11 15:40:16 +0100	[diff] [blame]	434	} // namespace test
				435	} // namespace arm_compute