Blame - docs/user_guide/advanced.dox - ml/ComputeLibrary

blob: 2b9e0d02f72e70f5119ee9d6f92f03b8a750a351 [file] [log] [blame]

Sheri Zhang	d813bab	2021-04-30 16:53:41 +0100	[diff] [blame]	1	///
				2	/// Copyright (c) 2017-2021 Arm Limited.
				3	///
				4	/// SPDX-License-Identifier: MIT
				5	///
				6	/// Permission is hereby granted, free of charge, to any person obtaining a copy
				7	/// of this software and associated documentation files (the "Software"), to
				8	/// deal in the Software without restriction, including without limitation the
				9	/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
				10	/// sell copies of the Software, and to permit persons to whom the Software is
				11	/// furnished to do so, subject to the following conditions:
				12	///
				13	/// The above copyright notice and this permission notice shall be included in all
				14	/// copies or substantial portions of the Software.
				15	///
				16	/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
				17	/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
				18	/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
				19	/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
				20	/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
				21	/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
				22	/// SOFTWARE.
				23	///
				24	namespace arm_compute
				25	{
				26	/** @page advanced Advanced
				27
				28	@tableofcontents
				29
				30	@section S1_8_cl_tuner OpenCL Tuner
				31
				32	The OpenCL tuner, a.k.a. CLTuner, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS).
				33	The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file.
				34	The OpenCL tuner runs the same OpenCL kernel for a range of local workgroup sizes and keeps the local workgroup size of the fastest run to use in subsequent calls to the kernel. It supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file.
				35	In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase.
				36
				37	If you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link:
				38
				39	https://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice
				40
				41	Tuning a network from scratch can be long and affect considerably the execution time for the first run of your network. It is recommended for this reason to store the CLTuner's result in a file to amortize this time when you either re-use the same network or the functions with the same configurations. The tuning is performed only once for each OpenCL kernel.
				42
				43	CLTuner looks for the optimal LWS for each unique OpenCL kernel configuration. Since a function (i.e. Convolution Layer, Pooling Layer, Fully Connected Layer ...) can be called multiple times but with different parameters, we associate an "id" (called "config_id") to each kernel to distinguish the unique configurations.
				44
				45	#Example: 2 unique Matrix Multiply configurations
				46	@code{.cpp}
				47	TensorShape a0 = TensorShape(32,32);
				48	TensorShape b0 = TensorShape(32,32);
				49	TensorShape c0 = TensorShape(32,32);
				50	TensorShape a1 = TensorShape(64,64);
				51	TensorShape b1 = TensorShape(64,64);
				52	TensorShape c1 = TensorShape(64,64);
				53
				54	Tensor a0_tensor;
				55	Tensor b0_tensor;
				56	Tensor c0_tensor;
				57	Tensor a1_tensor;
				58	Tensor b1_tensor;
				59	Tensor c1_tensor;
				60
				61	a0_tensor.allocator()->init(TensorInfo(a0, 1, DataType::F32));
				62	b0_tensor.allocator()->init(TensorInfo(b0, 1, DataType::F32));
				63	c0_tensor.allocator()->init(TensorInfo(c0, 1, DataType::F32));
				64	a1_tensor.allocator()->init(TensorInfo(a1, 1, DataType::F32));
				65	b1_tensor.allocator()->init(TensorInfo(b1, 1, DataType::F32));
				66	c1_tensor.allocator()->init(TensorInfo(c1 1, DataType::F32));
				67
				68	CLGEMM gemm0;
				69	CLGEMM gemm1;
				70
				71	// Configuration 0
				72	gemm0.configure(&a0, &b0, nullptr, &c0, 1.0f, 0.0f);
				73
				74	// Configuration 1
				75	gemm1.configure(&a1, &b1, nullptr, &c1, 1.0f, 0.0f);
				76	@endcode
				77
				78	@subsection S1_8_1_cl_tuner_how_to How to use it
				79
				80	All the graph examples in the Compute Library's folder "examples" and the arm_compute_benchmark accept an argument to enable the OpenCL tuner and an argument to export/import the LWS values to/from a file
				81
				82	#Enable CL tuner
				83	./graph_mobilenet --enable-tuner –-target=CL
				84	./arm_compute_benchmark --enable-tuner
				85
				86	#Export/Import to/from a file
				87	./graph_mobilenet --enable-tuner --target=CL --tuner-file=acl_tuner.csv
				88	./arm_compute_benchmark --enable-tuner --tuner-file=acl_tuner.csv
				89
				90	If you are importing the CLTuner'results from a file, the new tuned LWS values will be appended to it.
				91
				92	Either you are benchmarking the graph examples or the test cases in the arm_compute_benchmark remember to:
				93
				94	-# Disable the power management
				95	-# Keep the GPU frequency constant
				96	-# Run multiple times the network (i.e. 10).
				97
				98	If you are not using the graph API or the benchmark infrastructure you will need to manually pass a CLTuner object to CLScheduler before configuring any function.
				99
				100	@code{.cpp}
				101	CLTuner tuner;
				102
				103	// Setup Scheduler
				104	CLScheduler::get().default_init(&tuner);
				105	@endcode
				106
				107	After the first run, the CLTuner's results can be exported to a file using the method "save_to_file()".
				108	- tuner.save_to_file("results.csv");
				109
				110	This file can be also imported using the method "load_from_file("results.csv")".
				111	- tuner.load_from_file("results.csv");
				112
ramelg01	82fe7d3	2022-01-31 17:33:04 +0000	[diff] [blame]	113	@section Security Concerns
				114	Here are some security concerns that may affect Compute Library.
				115
				116	@subsection A process running under the same uid could read another process memory
				117
				118	Processes running under same user ID (UID) may be able to read each other memory and running state. Hence, This can
				119	lead to information disclosure and sensitive data can be leaked, such as the weights of the model currently executing.
				120	This mainly affects Linux systems and it's the responsibility of the system owner to make processes secure against
				121	this vulnerability. Moreover, the YAMA security kernel module can be used to detect and stop such a trial of hacking,
				122	it can be selected at the kernel compile time by CONFIG_SECURITY_YAMA and configured during runtime changing the
				123	ptrace_scope in /proc/sys/kernel/yama.
				124
				125	Please refer to: https://www.kernel.org/doc/html/v4.15/admin-guide/LSM/Yama.html for more information on this regard.
				126
Michalis Spyrou	3ba177f	2022-04-26 16:48:24 +0100	[diff] [blame]	127	@subsection Malicious users could alter Compute Library related files
				128
				129	Extra care must be taken in order to reduce the posibility of a user altering sensitive files. CLTuner files
				130	should be protected by arbitrary writes since this can lead Compute Library to crash or waste all system's resources.
				131
				132	@subsection Various concerns
				133
				134	Sensitive applications that use Compute Library should consider posible attack vectors such as shared library hooking,
				135	information leakage from the underlying OpenCL driver or previous excecution and running arbitrary networks that consume
				136	all the available resources on the system, leading to denial of service.
				137
Sheri Zhang	d813bab	2021-04-30 16:53:41 +0100	[diff] [blame]	138	*/
				139	} // namespace