blob: 0eb6cee4874b3a89f0771dffd4c07035b2aeab6c [file] [log] [blame]
Moritz Pflanzere3e73452017-08-11 15:40:16 +01001namespace arm_compute
2{
3namespace test
4{
Anthony Barbier6ff3b192017-09-04 18:44:23 +01005/**
Anthony Barbier79c61782017-06-23 11:48:24 +01006@page tests Validation and benchmarks tests
Anthony Barbier6ff3b192017-09-04 18:44:23 +01007
8@tableofcontents
9
Moritz Pflanzere3e73452017-08-11 15:40:16 +010010@section tests_overview Overview
11
12Benchmark and validation tests are based on the same framework to setup and run
13the tests. In addition to running simple, self-contained test functions the
14framework supports fixtures and data test cases. The former allows to share
15common setup routines between various backends thus reducing the amount of
16duplicated code. The latter can be used to parameterize tests or fixtures with
17different inputs, e.g. different tensor shapes. One limitation is that
18tests/fixtures cannot be parameterized based on the data type if static type
19information is needed within the test (e.g. to validate the results).
20
21@subsection tests_overview_structure Directory structure
22
23 .
24 |-- computer_vision <- Legacy tests. No new test must be added. <!-- FIXME: Remove before release -->
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010025 `-- tests <- Top level test directory. All files in here are shared among validation and benchmark.
26 |-- framework <- Underlying test framework.
Moritz Pflanzere3e73452017-08-11 15:40:16 +010027 |-- CL \
28 |-- NEON -> Backend specific files with helper functions etc.
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010029 |-- VX / <!-- FIXME: Remove VX -->
30 |-- benchmark <- Top level directory for the benchmarking files.
31 | |-- fixtures <- Fixtures for benchmark tests.
32 | |-- CL <- OpenCL backend test cases on a function level.
Moritz Pflanzere3e73452017-08-11 15:40:16 +010033 | | `-- SYSTEM <- OpenCL system tests, e.g. whole networks
34 | `-- NEON <- Same for NEON
35 | `-- SYSTEM
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010036 |-- datasets <- Datasets for benchmark and validation tests.
Moritz Pflanzere3e73452017-08-11 15:40:16 +010037 |-- main.cpp <- Main entry point for the tests. Currently shared between validation and benchmarking.
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010038 |-- networks <- Network classes for system level tests.
39 |-- validation_old <- Old validation framework. No new tests must be added! <!-- FIXME: Remove before release -->
40 | |-- dataset <- Old datasets for boost. Not to be used for new tests! <!-- FIXME: Remove before release -->
41 | |-- model_objects <- Old helper files for system level validation. Not to be used for new tests! <!-- FIXME: Remove before release -->
Moritz Pflanzere3e73452017-08-11 15:40:16 +010042 | |-- CL \
43 | |-- DEMO \
44 | |-- NEON --> Backend specific test cases
45 | |-- UNIT /
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010046 | |-- VX / <!-- FIXME: Remove VX -->
Moritz Pflanzere3e73452017-08-11 15:40:16 +010047 | `-- system_tests -> System level tests
48 | |-- CL
49 | `-- NEON
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010050 `-- validation -> Top level directory for validation files.
Moritz Pflanzere3e73452017-08-11 15:40:16 +010051 |-- CPP -> C++ reference code
52 |-- CL \
53 |-- NEON -> Backend specific test cases
Moritz Pflanzera09de0c2017-09-01 20:41:12 +010054 |-- VX / <!-- FIXME: Remove VX -->
Moritz Pflanzere3e73452017-08-11 15:40:16 +010055 `-- fixtures -> Fixtures shared among all backends. Used to setup target function and tensors.
56
57@subsection tests_overview_fixtures Fixtures
58
59Fixtures can be used to share common setup, teardown or even run tasks among
60multiple test cases. For that purpose a fixture can define a `setup`,
61`teardown` and `run` method. Additionally the constructor and destructor might
62also be customized.
63
64An instance of the fixture is created immediately before the actual test is
65executed. After construction the @ref framework::Fixture::setup method is called. Then the test
66function or the fixtures `run` method is invoked. After test execution the
67@ref framework::Fixture::teardown method is called and lastly the fixture is destructed.
68
69@subsubsection tests_overview_fixtures_fixture Fixture
70
71Fixtures for non-parameterized test are straightforward. The custom fixture
72class has to inherit from @ref framework::Fixture and choose to implement any of the
73`setup`, `teardown` or `run` methods. None of the methods takes any arguments
74or returns anything.
75
76 class CustomFixture : public framework::Fixture
77 {
78 void setup()
79 {
80 _ptr = malloc(4000);
81 }
82
83 void run()
84 {
85 ARM_COMPUTE_ASSERT(_ptr != nullptr);
86 }
87
88 void teardown()
89 {
90 free(_ptr);
91 }
92
93 void *_ptr;
94 };
95
96@subsubsection tests_overview_fixtures_data_fixture Data fixture
97
98The advantage of a parameterized fixture is that arguments can be passed to the setup method at runtime. To make this possible the setup method has to be a template with a type parameter for every argument (though the template parameter doesn't have to be used). All other methods remain the same.
99
100 class CustomFixture : public framework::Fixture
101 {
102 #ifdef ALTERNATIVE_DECLARATION
103 template <typename ...>
104 void setup(size_t size)
105 {
106 _ptr = malloc(size);
107 }
108 #else
109 template <typename T>
110 void setup(T size)
111 {
112 _ptr = malloc(size);
113 }
114 #endif
115
116 void run()
117 {
118 ARM_COMPUTE_ASSERT(_ptr != nullptr);
119 }
120
121 void teardown()
122 {
123 free(_ptr);
124 }
125
126 void *_ptr;
127 };
128
129@subsection tests_overview_test_cases Test cases
130
131All following commands can be optionally prefixed with `EXPECTED_FAILURE_` or
132`DISABLED_`.
133
134@subsubsection tests_overview_test_cases_test_case Test case
135
136A simple test case function taking no inputs and having no (shared) state.
137
138- First argument is the name of the test case (has to be unique within the
139 enclosing test suite).
140- Second argument is the dataset mode in which the test will be active.
141
142
143 TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT)
144 {
145 ARM_COMPUTE_ASSERT_EQUAL(1 + 1, 2);
146 }
147
148@subsubsection tests_overview_test_cases_fixture_fixture_test_case Fixture test case
149
150A simple test case function taking no inputs that inherits from a fixture. The
151test case will have access to all public and protected members of the fixture.
152Only the setup and teardown methods of the fixture will be used. The body of
153this function will be used as test function.
154
155- First argument is the name of the test case (has to be unique within the
156 enclosing test suite).
157- Second argument is the class name of the fixture.
158- Third argument is the dataset mode in which the test will be active.
159
160
161 class FixtureName : public framework::Fixture
162 {
163 public:
164 void setup() override
165 {
166 _one = 1;
167 }
168
169 protected:
170 int _one;
171 };
172
173 FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT)
174 {
175 ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
176 }
177
178@subsubsection tests_overview_test_cases_fixture_register_fixture_test_case Registering a fixture as test case
179
180Allows to use a fixture directly as test case. Instead of defining a new test
181function the run method of the fixture will be executed.
182
183- First argument is the name of the test case (has to be unique within the
184 enclosing test suite).
185- Second argument is the class name of the fixture.
186- Third argument is the dataset mode in which the test will be active.
187
188
189 class FixtureName : public framework::Fixture
190 {
191 public:
192 void setup() override
193 {
194 _one = 1;
195 }
196
197 void run() override
198 {
199 ARM_COMPUTE_ASSERT_EQUAL(_one + 1, 2);
200 }
201
202 protected:
203 int _one;
204 };
205
206 REGISTER_FIXTURE_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT);
207
208
209@subsubsection tests_overview_test_cases_data_test_case Data test case
210
211A parameterized test case function that has no (shared) state. The dataset will
212be used to generate versions of the test case with different inputs.
213
214- First argument is the name of the test case (has to be unique within the
215 enclosing test suite).
216- Second argument is the dataset mode in which the test will be active.
217- Third argument is the dataset.
218- Further arguments specify names of the arguments to the test function. The
219 number must match the arity of the dataset.
220
221
222 DATA_TEST_CASE(TestCaseName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}), num)
223 {
224 ARM_COMPUTE_ASSERT(num < 4);
225 }
226
227@subsubsection tests_overview_test_cases_fixture_data_test_case Fixture data test case
228
229A parameterized test case that inherits from a fixture. The test case will have
230access to all public and protected members of the fixture. Only the setup and
231teardown methods of the fixture will be used. The setup method of the fixture
232needs to be a template and has to accept inputs from the dataset as arguments.
233The body of this function will be used as test function. The dataset will be
234used to generate versions of the test case with different inputs.
235
236- First argument is the name of the test case (has to be unique within the
237 enclosing test suite).
238- Second argument is the class name of the fixture.
239- Third argument is the dataset mode in which the test will be active.
240- Fourth argument is the dataset.
241
242
243 class FixtureName : public framework::Fixture
244 {
245 public:
246 template <typename T>
247 void setup(T num)
248 {
249 _num = num;
250 }
251
252 protected:
253 int _num;
254 };
255
256 FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}))
257 {
258 ARM_COMPUTE_ASSERT(_num < 4);
259 }
260
261@subsubsection tests_overview_test_cases_register_fixture_data_test_case Registering a fixture as data test case
262
263Allows to use a fixture directly as parameterized test case. Instead of
264defining a new test function the run method of the fixture will be executed.
265The setup method of the fixture needs to be a template and has to accept inputs
266from the dataset as arguments. The dataset will be used to generate versions of
267the test case with different inputs.
268
269- First argument is the name of the test case (has to be unique within the
270 enclosing test suite).
271- Second argument is the class name of the fixture.
272- Third argument is the dataset mode in which the test will be active.
273- Fourth argument is the dataset.
274
275
276 class FixtureName : public framework::Fixture
277 {
278 public:
279 template <typename T>
280 void setup(T num)
281 {
282 _num = num;
283 }
284
285 void run() override
286 {
287 ARM_COMPUTE_ASSERT(_num < 4);
288 }
289
290 protected:
291 int _num;
292 };
293
294 REGISTER_FIXTURE_DATA_TEST_CASE(TestCaseName, FixtureName, DatasetMode::PRECOMMIT, framework::make("Numbers", {1, 2, 3}));
295
296@section writing_tests Writing validation tests
297
298Before starting a new test case have a look at the existing ones. They should
299provide a good overview how test cases are structured.
300
Anthony Barbier144d2ff2017-09-29 10:46:08 +0100301- The C++ reference needs to be added to `tests/validation/CPP/`. The
Moritz Pflanzere3e73452017-08-11 15:40:16 +0100302 reference function is typically a template parameterized by the underlying
303 value type of the `SimpleTensor`. This makes it easy to specialise for
304 different data types.
305- If all backends have a common interface it makes sense to share the setup
306 code. This can be done by adding a fixture in
Anthony Barbier144d2ff2017-09-29 10:46:08 +0100307 `tests/validation/fixtures/`. Inside of the `setup` method of a fixture
Moritz Pflanzere3e73452017-08-11 15:40:16 +0100308 the tensors can be created and initialised and the function can be configured
309 and run. The actual test will only have to validate the results. To be shared
310 among multiple backends the fixture class is usually a template that accepts
311 the specific types (data, tensor class, function class etc.) as parameters.
312- The actual test cases need to be added for each backend individually.
313 Typically the will be multiple tests for different data types and for
314 different execution modes, e.g. precommit and nightly.
315
Moritz Pflanzera09de0c2017-09-01 20:41:12 +0100316<!-- FIXME: Remove before release -->
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100317@section building_test_dependencies Building dependencies
318
Moritz Pflanzere3e73452017-08-11 15:40:16 +0100319@note Only required when tests from the old validation framework need to be run.
320
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100321The tests currently make use of Boost (Test and Program options) for
322validation. Below are instructions about how to build these 3rd party
323libraries.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100324
Anthony Barbier79c61782017-06-23 11:48:24 +0100325@note By default the build of the validation and benchmark tests is disabled, to enable it use `validation_tests=1` and `benchmark_tests=1`
326
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100327@subsection building_boost Building Boost
328
329First follow the instructions from the Boost library on how to setup the Boost
330build system
331(http://www.boost.org/doc/libs/1_64_0/more/getting_started/index.html).
332Afterwards the required libraries can be build with:
333
334 ./b2 --with-program_options --with-test link=static \
335 define=BOOST_TEST_ALTERNATIVE_INIT_API
336
337Additionally, depending on your environment, it might be necessary to specify
338the ```toolset=``` option to choose the right compiler. Moreover,
339```address-model=32``` can be used to force building for 32bit and
340```target-os=android``` must be specified to build for Android.
341
342After executing the build command the libraries
343```libboost_program_options.a``` and ```libboost_unit_test_framework.a``` can
344be found in ```./stage/lib```.
Moritz Pflanzera09de0c2017-09-01 20:41:12 +0100345<!-- FIXME: end remove -->
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100346
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100347@section tests_running_tests Running tests
348@subsection tests_running_tests_benchmarking Benchmarking
349@subsubsection tests_running_tests_benchmarking_filter Filter tests
350All tests can be run by invoking
351
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100352 ./arm_compute_benchmark ./data
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100353
354where `./data` contains the assets needed by the tests.
355
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100356If only a subset of the tests has to be executed the `--filter` option takes a
357regular expression to select matching tests.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100358
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100359 ./arm_compute_benchmark --filter='NEON/.*AlexNet' ./data
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100360
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100361Additionally each test has a test id which can be used as a filter, too.
362However, the test id is not guaranteed to be stable when new tests are added.
363Only for a specific build the same the test will keep its id.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100364
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100365 ./arm_compute_benchmark --filter-id=10 ./data
366
367All available tests can be displayed with the `--list-tests` switch.
368
369 ./arm_compute_benchmark --list-tests
370
371More options can be found in the `--help` message.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100372
373@subsubsection tests_running_tests_benchmarking_runtime Runtime
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100374By default every test is run once on a single thread. The number of iterations
375can be controlled via the `--iterations` option and the number of threads via
376`--threads`.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100377
Moritz Pflanzer2b26b852017-07-21 10:09:30 +0100378@subsubsection tests_running_tests_benchmarking_output Output
379By default the benchmarking results are printed in a human readable format on
380the command line. The colored output can be disabled via `--no-color-output`.
381As an alternative output format JSON is supported and can be selected via
382`--log-format=json`. To write the output to a file instead of stdout the
383`--log-file` option can be used.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100384
Anthony Barbier144d2ff2017-09-29 10:46:08 +0100385@subsubsection tests_running_tests_benchmarking_mode Mode
386Tests contain different datasets of different sizes, some of which will take several hours to run.
387You can select which datasets to use by using the `--mode` option, we recommed you use `--mode=precommit` to start with.
388
389@subsubsection tests_running_tests_benchmarking_instruments Instruments
390You can use the `--instruments` option to select one or more instruments to measure the execution time of the benchmark tests.
391
392`PMU` will try to read the CPU PMU events from the kernel (They need to be enabled on your platform)
393
394`MALI` will try to collect Mali hardware performance counters. (You need to have a recent enough Mali driver)
395
396`WALL_CLOCK` will measure time using `gettimeofday`: this should work on all platforms.
397
398You can pass a combinations of these instruments: `--instruments=PMU,MALI,WALL_CLOCK`
399
400@note You need to make sure the instruments have been selected at compile time using the `pmu=1` or `mali=1` scons options.
401
Moritz Pflanzera09de0c2017-09-01 20:41:12 +0100402<!-- FIXME: Remove before release and change above to benchmark and validation -->
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100403@subsection tests_running_tests_validation Validation
Moritz Pflanzere3e73452017-08-11 15:40:16 +0100404
405@note The new validation tests have the same interface as the benchmarking tests.
406
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100407@subsubsection tests_running_tests_validation_filter Filter tests
408All tests can be run by invoking
409
410 ./arm_compute_validation -- ./data
411
412where `./data` contains the assets needed by the tests.
413
414As running all tests can take a lot of time the suite is split into "precommit" and "nightly" tests. The precommit tests will be fast to execute but still cover the most important features. In contrast the nightly tests offer more extensive coverage but take longer. The different subsets can be selected from the command line as follows:
415
416 ./arm_compute_validation -t @precommit -- ./data
417 ./arm_compute_validation -t @nightly -- ./data
418
419Additionally it is possible to select specific suites or tests:
420
421 ./arm_compute_validation -t CL -- ./data
422 ./arm_compute_validation -t NEON/BitwiseAnd/RunSmall/_0 -- ./data
423
424All available tests can be displayed with the `--list_content` switch.
425
426 ./arm_compute_validation --list_content -- ./data
427
428For a complete list of possible selectors please see: http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/runtime_config/test_unit_filtering.html
429
430@subsubsection tests_running_tests_validation_verbosity Verbosity
431There are two separate flags to control the verbosity of the test output. `--report_level` controls the verbosity of the summary produced after all tests have been executed. `--log_level` controls the verbosity of the information generated during the execution of tests. All available settings can be found in the Boost documentation for [--report_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/report_level.html) and [--log_level](http://www.boost.org/doc/libs/1_64_0/libs/test/doc/html/boost_test/utf_reference/rt_param_reference/log_level.html), respectively.
Anthony Barbier144d2ff2017-09-29 10:46:08 +0100432<!-- FIXME: end remove -->
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100433*/
Moritz Pflanzere3e73452017-08-11 15:40:16 +0100434} // namespace test
435} // namespace arm_compute