blob: 5968ec8e3ca953028201bf06484e4da83b9430a7 [file] [log] [blame]
Vidhya Sudhan Loganathand646ae12018-11-19 15:18:20 +00001///
Jonathan Deakin2db938c2024-02-05 15:32:31 +00002/// Copyright (c) 2017-2024 Arm Limited.
Vidhya Sudhan Loganathand646ae12018-11-19 15:18:20 +00003///
4/// SPDX-License-Identifier: MIT
5///
6/// Permission is hereby granted, free of charge, to any person obtaining a copy
7/// of this software and associated documentation files (the "Software"), to
8/// deal in the Software without restriction, including without limitation the
9/// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
10/// sell copies of the Software, and to permit persons to whom the Software is
11/// furnished to do so, subject to the following conditions:
12///
13/// The above copyright notice and this permission notice shall be included in all
14/// copies or substantial portions of the Software.
15///
16/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22/// SOFTWARE.
23///
Anthony Barbier3762e742018-03-02 11:49:33 +000024namespace arm_compute
25{
Sheri Zhangd813bab2021-04-30 16:53:41 +010026/** @page versions_changelogs Release Versions and Changelog
Anthony Barbier6ff3b192017-09-04 18:44:23 +010027
28@tableofcontents
29
Sheri Zhangd813bab2021-04-30 16:53:41 +010030@section S2_1_versions Release versions
Anthony Barbier6ff3b192017-09-04 18:44:23 +010031
32All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number.
33If there is more than one release in a month then an extra sequential number is appended at the end:
34
35 v17.03 (First release of March 2017)
36 v17.03.1 (Second release of March 2017)
37 v17.04 (First release of April 2017)
38
39@note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.
Ramy Elgammalfa8ff8e2022-08-12 16:57:10 +010040@note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly.
Anthony Barbier6ff3b192017-09-04 18:44:23 +010041
Sheri Zhangd813bab2021-04-30 16:53:41 +010042@section S2_2_changelog Changelog
Pablo Marquez Tello29e27b02023-08-03 14:47:31 +010043
Michael Tylerfc94f4d2024-06-04 15:47:37 +010044v24.07 Public major release
45 - Add support for mixed sign quantized convolution.
46 - Add support for mixed sign dequantized GEMM.
47 - Add SME FP16 GEMV kernel.
48 - Change SME vector length function to use RDSVL instead of static variable.
49 - Remove unused "get_default_activation_values" functions.
50 - Add SVE fixed format interleaved BF16 DOT kernel.
51 - Updates and optimizations to assembly kernels.
Ryo Suzuki232c9ad2024-06-19 09:37:24 +000052 - Expose CpuGemm functionality using the experimental operators api
Michael Tylere6836522024-06-25 14:09:37 +010053 - Optimize CPU operator memory management.
Michael Tylerfc94f4d2024-06-04 15:47:37 +010054
Ramy Elgammale8f28132024-06-12 18:22:57 +010055v24.06 Public minor release
56 - Enable FP16 in multiple Neon™ kernels for multi_isa + v8a
57 - Fix OpenMP® thread scheduling for large machine
Gunes Bayirab538a22024-05-21 15:39:54 +010058 - Optimize CPU activation functions using LUT-based implementation:
59 - Tanh function for FP16.
60
Gunes Bayirada32002024-04-24 10:27:13 +010061v24.05 Public major release
Gunes Bayir301e33f2024-04-29 17:00:14 +010062 - Add @ref CLScatter operator for FP32/16, S32/16/8, U32/16/8 data types
Ramy Elgammalb4b61a62024-05-14 15:21:07 +010063 - Various fixes to enable FP16 kernels in armv8a multi_isa builds.
Omar Al Khatibf5053f72024-05-09 16:06:23 +010064 - Updated logic in the OpenMP scheduler to exclude LITTLE cores.
Gunes Bayirada32002024-04-24 10:27:13 +010065
Gunes Bayiref637392024-02-12 21:32:51 +000066v24.04 Public major release
Renato Arantes36a75da2024-01-26 17:31:18 +000067 - Add Bfloat16 data type support for @ref NEMatMul.
Omar Al Khatibc1575b22024-04-23 16:26:56 +010068 - Add support for SoftMax in SME2 for FP32, FP16, QASYMM8 and QASYMM8_SIGNED.
Radu Salavatf1f1f872024-02-27 18:32:26 +000069 - Add support for in place accumulation to CPU GEMM kernels.
Jonathan Deakina668f9f2024-01-24 09:15:38 +000070 - Add low-precision Int8 * Int8 -> FP32 CPU GEMM which dequantizes after multiplication
71 - Add is_dynamic flag to QuantizationInfo to signal to operators that it may change after configuration
Michael Kozlov5057ce92024-04-17 14:34:46 +010072 - Performance optimizations:
73 - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
74 - Optimize @ref NEConvolutionLayer for input tensor size > 1e7 bytes and weight tensor height > 7
75 - Optimize @ref NESoftmaxLayer for axis != 0 by natively supporting higher axes up to axis 3.
Gunes Bayiref637392024-02-12 21:32:51 +000076
Felix Thomasmathibaland0611c12024-03-08 15:34:58 +000077v24.02.1 Public patch release
78 - Fix performance regression in fixed-format kernels
79 - Fix compile and runtime errors in arm_compute_validation for Windows on Arm(WoA)
80
Jonathan Deakin2db938c2024-02-05 15:32:31 +000081v24.02 Public major release
Felix Thomasmathibaland98e27e2024-02-12 13:48:29 +000082 - Replace template writer with compute kernel writer in dynamic fusion.
Jonathan Deakin2db938c2024-02-05 15:32:31 +000083 - Performance optimizations:
84 - Parallelize @ref NEDepthwiseConvolutionLayer over batches if there is only 1 row
85
Jakub Sujake30c8742023-11-13 14:57:16 +000086v24.01 Public major release
87 - Remove the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
88 You should link only to the main `libarm_compute` library for core functionality.
Gunes Bayir85cafff2023-12-18 13:29:31 +000089 - Expand GPUTarget list with Mali™ G720 and G620.
Mohammed Suhail Munshi7467ba82023-12-05 14:27:31 +000090 - Optimize CPU activation functions using LUT-based implementation:
91 - Sigmoid function for FP16.
Pablo Marquez Tello9f7aca92023-08-16 15:21:44 +010092 - New features
93 - Add support for FP16 in all multi_isa builds.
Gunes Bayirfadc9b12023-11-07 05:43:07 +000094 - Performance optimizations:
95 - Optimize @ref NESoftmaxLayer
Viet-Hoa Do47370942023-11-13 17:20:45 +000096 - Optimize @ref NEDepthToSpaceLayer.
Jakub Sujake30c8742023-11-13 14:57:16 +000097
Pablo Marquez Tello29e27b02023-08-03 14:47:31 +010098v23.11 Public major release
Viet-Hoa Do633ebd12023-08-11 12:27:59 +010099 - New features
Pablo Marquez Tello29e27b02023-08-03 14:47:31 +0100100 - Add support for input data type U64/S64 in CLCast and NECast.
101 - Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer
Gunes Bayir91cb7332023-07-25 17:00:33 +0100102 - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface:
103 - @ref experimental::dynamic_fusion::GpuCkwResize
ramy.elgammal@arm.coma04ae3e2023-07-27 18:23:17 +0100104 - @ref experimental::dynamic_fusion::GpuCkwPool2d
105 - @ref experimental::dynamic_fusion::GpuCkwDepthwiseConv2d
Adnan AlSinanfde45d82023-10-24 12:03:21 +0100106 - @ref experimental::dynamic_fusion::GpuCkwMatMul
Viet-Hoa Do500e10b2023-09-12 17:49:38 +0100107 - Add support for OpenCL™ comand buffer with mutable dispatch extension.
Anitha Raj38eb5fb2023-11-13 14:55:40 +0000108 - Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82.
109 - Add support for negative axis values and inverted axis values in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse.
110 - Add new OpenCL™ kernels:
111 - @ref opencl::kernels::ClMatMulLowpNativeMMULKernel support for QASYMM8 and QASYMM8_SIGNED, with batch support
Anitha Rajeb5696d2023-07-14 11:19:34 +0100112 - Performance optimizations:
David Mansell1b2ee3e2023-08-22 13:27:03 +0100113 - Optimize @ref cpu::CpuReshape
Jakub Sujaka23b4682023-10-05 10:20:59 +0100114 - Optimize @ref opencl::ClTranspose
Gunes Bayir0b72aa42023-10-07 23:52:48 +0100115 - Optimize @ref NEStackLayer
Viet-Hoa Doc210c852023-10-09 10:58:35 +0100116 - Optimize @ref CLReductionOperation.
Viet-Hoa Do29254ae2023-10-13 17:40:32 +0100117 - Optimize @ref CLSoftmaxLayer.
SiCong Lic5ab4df2023-10-17 17:38:57 +0100118 - Optimize start-up time of @ref NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
Anitha Raj38eb5fb2023-11-13 14:55:40 +0000119 - Reduce CPU Overhead by optimal flushing of CL kernels.
Adnan AlSinan40a9d3e2023-09-15 13:46:17 +0100120 - Deprecate support for Bfloat16 in @ref cpu::CpuCast.
Adnan AlSinan704c22f2023-10-24 11:05:56 +0100121 - Support for U32 axis in @ref arm_compute::NEReverse and @ref arm_compute::CLReverse will be deprecated in 24.02.
Anitha Raj38eb5fb2023-11-13 14:55:40 +0000122 - Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface.
123 - Update OpenCL™ API headers to v2023.04.17
Pablo Marquez Tello29e27b02023-08-03 14:47:31 +0100124
Jakub Sujak59b9ff02023-06-11 21:35:11 +0100125v23.08 Public major release
126 - Deprecate the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose.
127 Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library for core functionality.
Ramy Elgammalc9525962023-05-19 14:23:37 +0100128 - New features
ramy.elgammal@arm.com11b23f72023-08-09 15:38:03 +0100129 - Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output.
130 - Add multi-sketch support for dynamic fusion.
131 - Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers.
132 - Add Fused Activation to CLMatMul.
133 - Implement FP32/FP16 @ref opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension.
134 - Use MatMul in fully connected layer with dynamic weights when supported.
135 - Optimize CPU depthwise convolution with channel multiplier.
136 - Add support in CpuCastKernel for conversion of S64/U64 to F32.
Ramy Elgammalc9525962023-05-19 14:23:37 +0100137 - Add new OpenCL™ kernels:
138 - @ref opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support
Viet-Hoa Do019a7d92023-06-27 16:33:57 +0100139 - Enable transposed convolution with non-square kernels on CPU and GPU.
ramy.elgammal@arm.com11b23f72023-08-09 15:38:03 +0100140 - Add support for input data type U64/S64 in CLCast.
141 - Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion.
Jakub Sujake1c96e72023-07-31 13:36:58 +0100142 - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only:
143 - @ref experimental::dynamic_fusion::GpuCkwActivation
144 - @ref experimental::dynamic_fusion::GpuCkwCast
145 - @ref experimental::dynamic_fusion::GpuCkwDirectConv2d
146 - @ref experimental::dynamic_fusion::GpuCkwElementwiseBinary
147 - @ref experimental::dynamic_fusion::GpuCkwStore
Viet-Hoa Do0c19f592023-08-01 14:42:41 +0100148 - Various optimizations and bug fixes.
149
ramy.elgammal@arm.com2f0ef002023-06-28 21:31:03 +0100150v23.05.1 Public patch release
151 - Enable CMake and Bazel option to build multi_isa without FP16 support.
152 - Fix compilation error in NEReorderLayer (aarch64 only).
153 - Disable invalid (false-negative) validation test with CPU scale layer on FP16.
154 - Various bug fixes
155
SiCong Li8893e452023-03-23 12:06:45 +0000156v23.05 Public major release
Omar Al Khatib32a62502023-05-10 11:45:20 +0100157 - New features:
158 - Add new Arm® Neon™ kernels / functions:
159 - @ref NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
160 - NEReorderLayer (aarch64 only)
161 - Add new OpenCL™ kernels / functions:
162 - @ref CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
163 - Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer.
164 - Add support for dynamic weights in @ref CLFullyConnectedLayer and @ref NEFullyConnectedLayer for all data types.
165 - Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types.
166 - Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™.
167 - Implement RSQRT for quantized data types on OpenCL™.
168 - Add FP16 depthwise convolution kernels for SME2.
169 - Performance optimizations:
170 - Improve CLTuner exhaustive mode tuning time.
171 - Deprecate dynamic block shape in @ref NEBatchToSpaceLayer and @ref CLBatchToSpaceLayer.
172 - Various optimizations and bug fixes.
SiCong Li8893e452023-03-23 12:06:45 +0000173
Jakub Sujak22e76132023-03-13 17:27:51 +0000174v23.02.1 Public patch release
175 - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels.
176 - Fixes for experimental CPU only Bazel and CMake builds.
177
SiCong Li90e57202023-02-01 14:39:41 +0000178v23.02 Public major release
Jakub Sujak06db85e2023-02-06 17:42:47 +0000179 - New features:
180 - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
181 - Add the following operators to the experimental dynamic fusion API:
182 - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
183 - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
Jakub Sujak9eefd4b2023-02-10 14:36:48 +0000184 - Add new CPU operator AddMulAdd for float and quantized types.
Jakub Sujak06db85e2023-02-06 17:42:47 +0000185 - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
Jakub Sujak06db85e2023-02-06 17:42:47 +0000186 - Add experimental support for CPU only Bazel and CMake builds.
187 - Performance optimizations:
188 - Optimize CPU base-e exponential functions for FP32.
189 - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
190 - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
191 - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
192 - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
Jakub Sujak9eefd4b2023-02-10 14:36:48 +0000193 - Update the heuristic for CLDepthwiseConvolutionNative kernel.
194 - Add new optimized OpenCL kernel to compute indirect convolution:
195 - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink
196 - Add new optimized OpenCL kernel to compute transposed convolution:
197 - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink
SiCong Li90e57202023-02-01 14:39:41 +0000198 - Update recommended/minimum NDK version to r20b.
Jakub Sujak06db85e2023-02-06 17:42:47 +0000199 - Various optimizations and bug fixes.
Anthony Barbier6ff3b192017-09-04 18:44:23 +0100200
Viet-Hoa Dob1f82882022-11-11 11:29:50 +0000201v22.11 Public major release
202 - New features:
203 - Add new experimental dynamic fusion API.
Viet-Hoa Do293ab602022-11-15 10:51:26 +0000204 - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
Viet-Hoa Dob1f82882022-11-11 11:29:50 +0000205 - Add CPU MeanStdDevNorm for QASYMM8.
206 - Add CPU and GPU GELU activation function for FP32 and FP16.
207 - Add CPU swish activation function for FP32 and FP16.
208 - Performance optimizations:
209 - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
210 - Optimize CPU activation functions using LUT-based implementation:
211 - Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
212 - Hard swish function for QASYMM8_SIGNED.
213 - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
214 - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
215 - Optimize GPU depthwise convolution kernel and heuristic.
216 - Optimize GPU Conv2d heuristic.
217 - Optimize CPU MeanStdDevNorm for FP16.
218 - Optimize CPU tanh activation function for FP16 using rational approximation.
219 - Improve GPU GeMMLowp start-up time.
220 - Various optimizations and bug fixes.
221
SiCong Life1b1f62022-05-19 18:58:31 +0100222v22.08 Public major release
Ramy Elgammal0d274b72022-08-05 13:14:57 +0100223 - Various bug fixes.
224 - Disable unsafe FP optimizations causing accuracy issues in:
225 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
226 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink
227 - @ref CLDepthwiseConvolutionLayerNativeKernel
228 - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
229 - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
230 - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
231 - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
232 - Extend the direct convolution 2d interface to configure the block size.
233 - Update ClConv2D heuristic to use direct convolution.
234 - Use official Khronos® OpenCL extensions:
235 - Add cl_khr_integer_dot_product extension support.
236 - Add support of OpenCL 3.0 non-uniform workgroup.
237 - Cpu performance optimizations:
238 - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
239 - Optimize Add layer by considering the input tensors as 1D array.
240 - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
241 - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator.
Jakub Sujak117e17e2023-02-21 10:52:57 +0000242 - Add experimental support for native builds for Windows® on Arm™.
Ramy Elgammal966218d2022-08-11 16:23:22 +0100243 - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
SiCong Life1b1f62022-05-19 18:58:31 +0100244 - Build flag change: toolchain_prefix, compiler_prefix:
Ramy Elgammal0d274b72022-08-05 13:14:57 +0100245 - Use empty string "" to suppress any prefixes.
246 - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified.
247 - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools.
248 - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto".
249 - armv7a with Android build will no longer be tested or maintained.
SiCong Life1b1f62022-05-19 18:58:31 +0100250
Adnan AlSinan2921e5b2022-05-16 14:30:41 +0100251v22.05 Public major release
252 - Various bug fixes.
253 - Various optimizations.
254 - Add support for NDK r23b.
255 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
256 - New Arm® Neon™ kernels / functions :
257 - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink
258 - New OpenCL kernels / functions :
259 - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink
260 - Improve the start-up times for the following OpenCL kernels:
261 - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink
262 - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink
263 - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink
264 - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink
265 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
266 - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
267 - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink
268 - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink
269 - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink
270 - @ref NEFuseBatchNormalizationKernel
271 - @ref NEL2NormalizeLayerKernel
272
Adnan AlSinan69854ba2022-02-07 15:28:56 +0000273v22.02 Public major release
274 - Various bug fixes.
275 - Various optimizations.
276 - Update A510 arm_gemm cpu Kernels.
277 - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details.
278 - Improve the start-up time for the following OpenCL kernels:
279 - @ref CLScale
280 - @ref CLGEMM
281 - @ref CLDepthwiseConvolutionLayer
282 - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink
283 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
284 - Remove functions:
285 - CLRemap
286 - NERemap
287 - Remove padding from OpenCL kernels:
288 - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink
289 - Remove padding from Cpu kernels:
290 - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink
291 - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int):
292 - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink
293 - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink
294 - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink
295 - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink
296 - @ref NEBoundingBoxTransformKernel
297 - @ref NECropKernel
298 - @ref NEComputeAllAnchorsKernel
299 - @ref NEInstanceNormalizationLayerKernel
Adnan AlSinanbb8b2352022-02-14 14:30:38 +0000300 - NEMaxUnpoolingLayerKernel
Adnan AlSinan69854ba2022-02-07 15:28:56 +0000301 - @ref NEMeanStdDevNormalizationKernel
302 - @ref NERangeKernel
303 - @ref NEROIAlignLayerKernel
304 - @ref NESelectKernel
305
Sheri Zhang5dda2172021-10-15 19:54:17 +0100306v21.11 Public major release
307 - Various bug fixes.
Gunes Bayir08773702021-11-05 12:34:34 +0000308 - Various optimizations:
309 - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types
Adnan AlSinanabc093b2022-02-08 16:57:06 +0000310 - Improve performance of Softmax on GPU for Uint8/Int8
Sheri Zhang5dda2172021-10-15 19:54:17 +0100311 - New OpenCL kernels / functions:
312 - @ref CLConv3D
313 - New Arm® Neon™ kernels / functions:
314 - @ref NEConv3D
Gunes Bayir08773702021-11-05 12:34:34 +0000315 - Support configurable build by a selected subset of operator list
316 - Support MobileBert on Neon™ backend
317 - Improve operator/function logging
318 - Remove padding from OpenCL kernels:
319 - ClPool2dKernel
320 - ClScaleKernel
321 - ClGemmMatrixMultiplyReshapedKernel
322 - Remove padding from Cpu kernels:
323 - CpuPool2dKernel
324 - Remove Y padding from OpenCL kernels:
325 - ClGemmMatrixMultiplyKernel
326 - ClGemmReshapedRHSMatrixKernel
327 - Remove legacy GeMM kernels in gemm_v1.cl
Sheri Zhang5dda2172021-10-15 19:54:17 +0100328
Freddie Liardet77014ff2021-08-05 15:50:31 +0100329v21.08 Public major release
330 - Various bug fixes.
331 - Various optimizations:
332 - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used
333 - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures
334 - Add dynamic weights support in Fully connected layer (CPU/GPU)
335 - Various performance optimizations for floating-point data types (CPU/GPU)
336 - Add a reduced core library build arm_compute_core_v2
337 - Expose Operator API
338 - Support fat binary build for arm8.2-a via fat_binary build flag
339 - Add CPU discovery capabilities
340 - Add data type f16 support for:
Adnan AlSinan6863fa02022-02-04 13:04:55 +0000341 - CLRemapKernel
Freddie Liardet77014ff2021-08-05 15:50:31 +0100342 - Port the following functions to stateless API:
343 - @ref CLConvolutionLayer
344 - @ref CLFlattenLayer
345 - @ref CLFullyConnectedLayer
346 - @ref CLGEMM
347 - @ref CLGEMMConvolutionLayer
348 - @ref CLGEMMLowpMatrixMultiplyCore
349 - @ref CLWinogradConvolutionLayer
350 - @ref NEConvolutionLayer
351 - @ref NEFlattenLayer
352 - @ref NEFullyConnectedLayer
353 - @ref NEGEMM
354 - @ref NEGEMMConv2d
355 - @ref NEGEMMConvolutionLayer
356 - @ref NEGEMMLowpMatrixMultiplyCore
357 - @ref NEWinogradConvolutionLayer
358 - Remove the following functions:
359 - CLWinogradInputTransform
360 - Remove CLCoreRuntimeContext
361 - Remove ICPPSimpleKernel
362 - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h
363
Michalis Spyrou27e67f02021-02-16 11:34:39 +0000364v21.05 Public major release
Sheri Zhangc2bed952021-05-06 12:12:38 +0100365 - Various bug fixes.
366 - Various optimisations.
367 - Various documentation updates:
Jakub Sujakee301b32021-06-04 09:46:08 +0100368 - Add supported operators and corresponding Android NNAPI operators.
369 - Documentation reorg into user guide and contributor guide.
Sheri Zhangc2bed952021-05-06 12:12:38 +0100370 - Add support for a global allocator for OpenCL tensors
371 - Add experimental support for [CLVK](https://github.com/kpet/clvk).
372 - Add data type S32 support for:
373 - @ref opencl::kernels::ClArithmeticKernel
374 - Add data type QASYMM8 support for:
375 - @ref CLROIPoolingLayer
376 - @ref CLROIPoolingLayerKernel
377 - @ref NEROIPoolingLayer
378 - @ref NEROIPoolingLayerKernel
379 - Add per-channel quantization support for:
380 - @ref CLDeconvolutionLayer
381 - @ref CLDirectDeconvolutionLayer
382 - @ref NEConvolutionLayer
383 - @ref NEDeconvolutionLayer
384 - Remove padding from OpenCL kernels:
385 - @ref CLL2NormalizeLayerKernel
Gian Marco Iodice8155c022021-04-16 15:08:59 +0100386 - CLDepthwiseConvolutionLayer3x3NHWCKernel
Sheri Zhangc2bed952021-05-06 12:12:38 +0100387 - @ref CLNormalizationLayerKernel
388 - @ref CLNormalizePlanarYUVLayerKernel
389 - @ref opencl::kernels::ClMulKernel
390 - @ref CLReductionOperationKernel
391 - @ref CLROIPoolingLayerKernel
392 - Remove computer vision support from Arm® Neon™ backend
393 - Remove the following functions:
Michalis Spyrou27e67f02021-02-16 11:34:39 +0000394 - NEAbsoluteDifference
395 - NEAccumulate
396 - NEBox3x3
397 - NECannyEdge
398 - NEChannelCombine
399 - NEChannelExtract
400 - NEColorConvert
Michalis Spyrou473cb012021-02-23 11:48:12 +0000401 - NEConvolution
Michalis Spyrou27e67f02021-02-16 11:34:39 +0000402 - NEDerivative
403 - NEDilate
404 - NEEqualizeHistogram
405 - NEErode
406 - NEFastCorners
407 - NEGaussian3x3
408 - NEGaussian5x5
409 - NEGaussianPyramid
410 - NEHOGDescriptor
411 - NEHOGDetector
412 - NEHOGGradient
413 - NEHOGMultiDetection
414 - NEHarrisCorners
415 - NEHistogram
416 - NEIntegralImage
417 - NELaplacianPyramid
418 - NELaplacianReconstruct
419 - NEMagnitude
420 - NEMeanStdDev
421 - NEMedian3x3
422 - NEMinMaxLocation
423 - NENonLinearFilter
424 - NEOpticalFlow
425 - NEPhase
Michalis Spyrou27e67f02021-02-16 11:34:39 +0000426 - NEScharr3x3
427 - NESobel3x3
428 - NESobel5x5
429 - NESobel7x7
430 - NETableLookup
431 - NEThreshold
432 - NEWarpAffine
Michalis Spyrou473cb012021-02-23 11:48:12 +0000433 - NEWarpPerspectiveKernel
Michalis Spyrou473cb012021-02-23 11:48:12 +0000434 - Remove all GLES kernels / functions / tests / examples
Sheri Zhangc2bed952021-05-06 12:12:38 +0100435 - Remove computer vision support from CL backend
436 - Remove the following functions:
Michalis Spyrou473cb012021-02-23 11:48:12 +0000437 - CLAbsoluteDifference
438 - CLAccumulate
439 - CLBox3x3
440 - CLCannyEdge
441 - CLChannelCombine
442 - CLChannelExtract
443 - CLColorConvert
444 - CLConvolution
445 - CLDerivative
446 - CLDilate
447 - CLEqualizeHistogram
448 - CLErode
449 - CLFastCorners
450 - CLGaussian3x3
451 - CLGaussian5x5
452 - CLGaussianPyramid
453 - CLHOGDescriptor
454 - CLHOGDetector
455 - CLHOGGradient
456 - CLHOGMultiDetection
457 - CLHarrisCorners
458 - CLHistogram
459 - CLIntegralImage
460 - CLLaplacianPyramid
461 - CLLaplacianReconstruct
462 - CLMagnitude
463 - CLMeanStdDev
464 - CLMedian3x3
465 - CLMinMaxLocation
466 - CLNonLinearFilter
467 - CLOpticalFlow
468 - CLPhase
469 - CLScharr3x3
470 - CLSobel3x3
471 - CLSobel5x5
472 - CLSobel7x7
473 - CLTableLookup
474 - CLThreshold
475 - CLWarpAffine
476 - CLWarpPerspective
Ramy Elgammal0d274b72022-08-05 13:14:57 +0100477
Georgios Pinitas40f51a62020-11-21 03:04:18 +0000478v21.02 Public major release
Sheri Zhangda6a6eb2021-01-06 11:15:06 +0000479 - Various bug fixes.
480 - Various optimisations.
Georgios Pinitas45514032020-12-30 00:03:09 +0000481 - Upgrade C++ standard to C++14
482 - Add macOS support
Giorgio Arena1055dc12021-02-19 09:53:06 +0000483 - Add Armv8-R AArch64 architecture support
Sheri Zhangda6a6eb2021-01-06 11:15:06 +0000484 - Add SVE/SVE2 support for:
Manuel Bottini10b38262021-02-19 18:16:44 +0000485 - NEScaleKernel
Sheri Zhangda6a6eb2021-01-06 11:15:06 +0000486 - @ref NEActivationLayer
487 - @ref NEArithmeticAddition
488 - @ref NEBatchNormalizationLayerKernel
Gunes Bayirfadc9b12023-11-07 05:43:07 +0000489 - cpu::kernels::CpuLogits1DSoftmaxKernel
490 - cpu::kernels::CpuLogits1DMaxKernel
Giorgio Arena1055dc12021-02-19 09:53:06 +0000491 - @ref cpu::kernels::CpuElementwiseUnaryKernel
Sheri Zhangdda69142021-02-01 19:06:57 +0000492 - Remove padding from OpenCL kernels:
Sheri Zhang1efed922021-03-10 22:43:38 +0000493 - CLDirectConvolutionLayerKernel
Sheri Zhangdda69142021-02-01 19:06:57 +0000494 - @ref CLArgMinMaxLayerKernel
495 - @ref CLPadLayerKernel
496 - @ref CLROIAlignLayerKernel
497 - @ref CLRangeKernel
Manuel Bottini3b131ab2021-02-19 18:16:44 +0000498 - CLScaleKernel
Sheri Zhangdda69142021-02-01 19:06:57 +0000499 - @ref CLSelectKernel
500 - @ref CLBitwiseKernel
Giorgio Arena1055dc12021-02-19 09:53:06 +0000501 - @ref opencl::kernels::ClFloorKernel
Teresa Charlin27886092021-02-25 20:15:01 +0000502 - CLTransposeKernel
Giorgio Arena5b50f422021-02-17 11:43:05 +0000503 - Deprecate functions in CLTuner:
504 - add_lws_to_table
505 - import_lws_table
506 - lws_table
Sheri Zhangda6a6eb2021-01-06 11:15:06 +0000507 - Remove functions:
Georgios Pinitas96b16b62020-12-01 17:41:34 +0000508 - NELocallyConnectedLayer / CLLocallyConnectedLayer
Georgios Pinitasf7c5a412020-12-03 14:38:33 +0000509 - NEIm2Col
510 - NECol2Im
511 - NEGEMMInterleave4x4
512 - NEGEMMTranspose1xW
Georgios Pinitas8c3c0e72020-12-03 20:11:53 +0000513 - NEComputeAllAnchors / CLComputeAllAnchors
Georgios Pinitasec2256b2020-12-03 18:51:58 +0000514 - NEGEMMAssemblyDispatch
Georgios Pinitasc53266e2020-12-09 03:11:53 +0000515 - NEUpsampleLayer / CLUpsampleLayer
Sheri Zhangda6a6eb2021-01-06 11:15:06 +0000516 - Remove kernels:
Georgios Pinitasd308df32020-12-01 16:56:36 +0000517 - NEGEMMMatrixVectorMultiplyKernel
Georgios Pinitas96b16b62020-12-01 17:41:34 +0000518 - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel
Georgios Pinitasc53266e2020-12-09 03:11:53 +0000519 - NEUpsampleLayerKernel / CLUpsampleLayerKernel
Gian Marco Iodicef5aad512021-02-08 17:34:40 +0000520 - Extend OpenCL tuner with workgroup batch size support
521 - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units
Gian Marco Iodice716b1be2021-02-10 17:33:27 +0000522 - Add functionality to load the OpenCL GEMM heuristics at runtime
523 - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL
Giorgio Arenacd7d1782021-02-22 14:58:37 +0000524 - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation
Jakub Sujakee301b32021-06-04 09:46:08 +0100525 - Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised
Georgios Pinitas40f51a62020-11-21 03:04:18 +0000526
SiCong Li96209c72020-08-21 12:28:30 +0100527v20.11 Public major release
morgolock70b1eb82020-11-24 13:54:19 +0000528 - Various bug fixes.
529 - Various optimisations.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000530 - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type.
morgolock0e728492020-11-20 11:03:33 +0000531 This is planned to be resolved in 21.02 release.
morgolock70b1eb82020-11-24 13:54:19 +0000532 - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer.
SiCong Li903f8cc2020-08-27 10:17:10 +0100533 - Added new data type S32 support for:
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000534 - NEArithmeticSubtraction
535 - NEArithmeticSubtractionKernel
SiCong Libb88f892020-08-28 11:18:47 +0100536 - @ref NEPixelWiseMultiplication
Sheri Zhang1e3ab422021-03-16 17:35:08 +0000537 - NEPixelWiseMultiplicationKernel
Sang-Hoon Park63001ac2021-01-18 14:20:27 +0000538 - NEElementwiseDivision
539 - NEDivisionOperationKernel
SiCong Li96209c72020-08-21 12:28:30 +0100540 - Interface change
541 - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension
542 on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5.
543 The supported value range of axis is [-rank, rank).
544 This change applies to the following functions:
545 - @ref NESoftmaxLayer
546 - @ref NELogSoftmaxLayer
547 - @ref CLSoftmaxLayer
548 - @ref CLLogSoftmaxLayer
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +0000549 - GCSoftmaxLayer
Sheri Zhang824061d2020-10-26 15:46:37 +0000550 - New OpenCL kernels / functions:
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100551 - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
morgolock0e728492020-11-20 11:03:33 +0000552 - @ref CLLogicalNot
553 - @ref CLLogicalAnd
554 - @ref CLLogicalOr
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000555 - New Arm® Neon™ kernels / functions:
morgolock0e728492020-11-20 11:03:33 +0000556 - @ref NELogicalNot
557 - @ref NELogicalAnd
558 - @ref NELogicalOr
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000559 - Removed padding from Arm® Neon™ kernels:
Sheri Zhang1e3ab422021-03-16 17:35:08 +0000560 - NEComplexPixelWiseMultiplicationKernel
Michalis Spyrou473cb012021-02-23 11:48:12 +0000561 - NENonMaximaSuppression3x3Kernel
Adnan AlSinan6863fa02022-02-04 13:04:55 +0000562 - NERemapKernel
Michele Di Giorgio93b75e02021-06-21 12:00:43 +0100563 - NEGEMMInterleave4x4Kernel
Manuel Bottini327225d2021-04-13 13:09:30 +0100564 - NEDirectConvolutionLayerKernel
Manuel Bottini10b38262021-02-19 18:16:44 +0000565 - NEScaleKernel
Georgios Pinitas96b16b62020-12-01 17:41:34 +0000566 - NELocallyConnectedMatrixMultiplyKernel
Manuel Bottinicfac51c2021-06-18 15:47:28 +0100567 - NEGEMMLowpOffsetContributionKernel
Michele Di Giorgio93b75e02021-06-21 12:00:43 +0100568 - NEGEMMTranspose1xWKernel
Michele Di Giorgio19289042021-02-03 16:05:00 +0000569 - NEPoolingLayerKernel
Michalis Spyrou473cb012021-02-23 11:48:12 +0000570 - NEConvolutionKernel
Michalis Spyrou60c3b0e2021-04-08 12:02:58 +0100571 - NEDepthwiseConvolutionLayerNativeKernel
Manuel Bottinicfac51c2021-06-18 15:47:28 +0100572 - NEGEMMLowpMatrixMultiplyKernel
Michele Di Giorgio53832b22021-06-21 14:45:44 +0100573 - NEGEMMMatrixMultiplyKernel
Manuel Bottini327225d2021-04-13 13:09:30 +0100574 - NEDirectConvolutionLayerOutputStageKernel
Sheri Zhanged367132020-10-08 15:46:16 +0100575 - @ref NEReductionOperationKernel
Manuel Bottinicfac51c2021-06-18 15:47:28 +0100576 - NEGEMMLowpMatrixAReductionKernel
577 - NEGEMMLowpMatrixBReductionKernel
Sheri Zhang824061d2020-10-26 15:46:37 +0000578 - Removed padding from OpenCL kernels:
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +0000579 - CLBatchConcatenateLayerKernel
Michele Di Giorgio1e0208a2021-01-22 15:42:59 +0000580 - CLElementwiseOperationKernel
Sheri Zhang824061d2020-10-26 15:46:37 +0000581 - @ref CLBatchNormalizationLayerKernel
Michele Di Giorgioe1314662021-02-01 17:09:32 +0000582 - CLPoolingLayerKernel
Manuel Bottinic6f4ec32021-05-18 18:41:56 +0100583 - CLWinogradInputTransformKernel
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100584 - CLGEMMLowpMatrixMultiplyNativeKernel
585 - CLGEMMLowpMatrixAReductionKernel
586 - CLGEMMLowpMatrixBReductionKernel
587 - CLGEMMLowpOffsetContributionOutputStageKernel
588 - CLGEMMLowpOffsetContributionKernel
Manuel Bottinic6f4ec32021-05-18 18:41:56 +0100589 - CLWinogradOutputTransformKernel
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100590 - CLGEMMLowpMatrixMultiplyReshapedKernel
Sheri Zhang824061d2020-10-26 15:46:37 +0000591 - @ref CLFuseBatchNormalizationKernel
592 - @ref CLDepthwiseConvolutionLayerNativeKernel
Georgios Pinitas11d84152021-04-28 10:20:18 +0100593 - CLDepthConvertLayerKernel
Sheri Zhang7e20e292021-02-02 11:49:34 +0000594 - CLCopyKernel
Gian Marco Iodice8155c022021-04-16 15:08:59 +0100595 - CLDepthwiseConvolutionLayer3x3NHWCKernel
Georgios Pinitasf47f7182021-01-15 09:29:50 +0000596 - CLActivationLayerKernel
Manuel Bottinic6f4ec32021-05-18 18:41:56 +0100597 - CLWinogradFilterTransformKernel
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +0000598 - CLWidthConcatenateLayerKernel
599 - CLWidthConcatenate4TensorsKernel
600 - CLWidthConcatenate2TensorsKernel
Sang-Hoon Park201e0fe2021-01-27 13:14:56 +0000601 - CLLogits1DMaxShiftExpSumKernel
602 - CLLogits1DNormKernel
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +0000603 - CLHeightConcatenateLayerKernel
Georgios Pinitas856f66e2021-04-22 21:13:21 +0100604 - CLGEMMMatrixMultiplyKernel
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100605 - CLGEMMLowpQuantizeDownInt32ScaleKernel
606 - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
607 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +0000608 - CLDepthConcatenateLayerKernel
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100609 - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel
Sheri Zhang824061d2020-10-26 15:46:37 +0000610 - Removed OpenCL kernels / functions:
611 - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
612 - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel
613 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel
morgolock00c76012020-11-06 10:40:12 +0000614 - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
Georgios Pinitas2d221392020-09-03 15:16:37 +0100615 - CLLocallyConnectedLayer
616 - CLLocallyConnectedMatrixMultiplyKernel
morgolock00c76012020-11-06 10:40:12 +0000617 - CLAbsoluteDifference
618 - CLAbsoluteDifferenceKernel
619 - CLAccumulate
620 - CLAccumulateKernel
621 - CLAccumulateSquared
622 - CLAccumulateSquaredKernel
623 - CLAccumulateWeighted
624 - CLAccumulateWeightedKernel
625 - CLAccumulateWeightedFP16Kernel
626 - CLBox3x3
627 - CLBox3x3Kernel
628 - CLBox3x3FP16Kernel
629 - CLCannyEdge
630 - CLChannelCombine
631 - CLChannelCombineKernel
632 - CLChannelExtract
633 - CLChannelExtractKernel
634 - CLColorConvert
635 - CLColorConvertKernel
636 - CLConvolution3x3
637 - CLConvolutionRectangle
638 - CLConvolutionRectangleKernel
639 - CLConvolutionSquare
640 - CLConvolutionKernel
641 - CLDerivative
642 - CLDerivativeKernel
643 - CLDilate
644 - CLDilateKernel
645 - CLEqualizeHistogram
646 - CLErode
647 - CLErodeKernel
648 - CLFastCorners
649 - CLFastCornersKernel
650 - CLGaussian3x3
651 - CLGaussian3x3Kernel
652 - CLGaussian5x5
653 - CLGaussian5x5HorKernel
654 - CLGaussian5x5VertKernel
655 - CLGaussianPyramid
656 - CLGaussianPyramidHalf
657 - CLGaussianPyramidOrb
658 - CLHarrisCorners
659 - CLHarrisScoreKernel
660 - CLHarrisScoreFP16Kernel
661 - CLHistogram
662 - CLHistogramKernel
663 - CLHOGOrientationBinningKernel
664 - CLHOGBlockNormalizationKernel
665 - CLHOGDetectorKernel
666 - CLHOGNonMaximaSuppressionKernel
667 - CLHOGDescriptor
668 - CLHOGDetector
669 - CLHOGGradient
670 - CLHOGMultiDetection
671 - CLHOGOrientationBinningKernel
672 - CLHOGBlockNormalizationKernel
673 - CLHOGDetectorKernel
674 - CLIntegralImage
675 - CLIntegralImageKernel
676 - CLLaplacianReconstruct
677 - CLLaplacianPyramid
678 - CLMagnitude
679 - CLMagnitudePhaseKernel
680 - CLMedian3x3
681 - CLMedian3x3Kernel
682 - CLMinMaxLocation
683 - CLMinMaxLocationKernel
684 - CLNonLinearFilter
685 - CLNonLinearFilterKernel
686 - CLNonMaximaSuppression3x3
687 - CLNonMaximaSuppression3x3FP16Kernel
688 - CLNonMaximaSuppression3x3Kernel
689 - CLOpticalFlow
690 - CLPhase
691 - CLRemap
692 - CLRemapKernel
693 - CLScharr3x3
694 - CLScharr3x3Kernel
695 - CLSobel3x3
696 - CLSobel3x3Kernel
697 - CLSobel5x5
698 - CLSobel5x5HorKernel
699 - CLSobel5x5VertKernel
700 - CLSobel7x7
701 - CLSobel7x7HorKernel
702 - CLSobel7x7VertKernel
703 - CLThreshold
704 - CLThresholdKernel
705 - CLWarpAffine
706 - CLWarpAffineKernel
707 - CLWarpPerspective
708 - CLWarpPerspectiveKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000709 - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
Georgios Pinitas2d221392020-09-03 15:16:37 +0100710 - NELocallyConnectedLayer
711 - NELocallyConnectedMatrixMultiplyKernel
morgolock0c862652020-11-06 08:59:45 +0000712 - NEAbsoluteDifference
713 - NEAbsoluteDifferenceKernel
714 - NEAccumulate
715 - NEAccumulateKernel
716 - NEAccumulateSquared
717 - NEAccumulateSquaredKernel
718 - NEAccumulateWeighted
719 - NEAccumulateWeightedKernel
720 - NEAccumulateWeightedFP16Kernel
721 - NEBox3x3
722 - NEBox3x3Kernel
723 - NEBox3x3FP16Kernel
724 - NECannyEdge
725 - NEChannelCombine
726 - NEChannelCombineKernel
727 - NEChannelExtract
728 - NEChannelExtractKernel
729 - NEColorConvert
730 - NEColorConvertKernel
731 - NEConvolution3x3
732 - NEConvolutionRectangle
733 - NEConvolutionRectangleKernel
734 - NEConvolutionSquare
735 - NEConvolutionKernel
736 - NEDerivative
737 - NEDerivativeKernel
738 - NEDilate
739 - NEDilateKernel
740 - NEEqualizeHistogram
741 - NEErode
742 - NEErodeKernel
743 - NEFastCorners
744 - NEFastCornersKernel
745 - NEGaussian3x3
746 - NEGaussian3x3Kernel
747 - NEGaussian5x5
748 - NEGaussian5x5HorKernel
749 - NEGaussian5x5VertKernel
750 - NEGaussianPyramid
751 - NEGaussianPyramidHalf
752 - NEGaussianPyramidOrb
753 - NEHarrisCorners
754 - NEHarrisScoreKernel
755 - NEHarrisScoreFP16Kernel
756 - NEHistogram
757 - NEHistogramKernel
758 - NEHOGOrientationBinningKernel
759 - NEHOGBlockNormalizationKernel
760 - NEHOGDetectorKernel
761 - NEHOGNonMaximaSuppressionKernel
762 - NEHOGDescriptor
763 - NEHOGDetector
764 - NEHOGGradient
765 - NEHOGMultiDetection
766 - NEHOGOrientationBinningKernel
767 - NEHOGBlockNormalizationKernel
768 - NEHOGDetectorKernel
769 - NEIntegralImage
770 - NEIntegralImageKernel
771 - NELaplacianReconstruct
772 - NELaplacianPyramid
773 - NEMagnitude
774 - NEMagnitudePhaseKernel
775 - NEMedian3x3
776 - NEMedian3x3Kernel
777 - NEMinMaxLocation
778 - NEMinMaxLocationKernel
779 - NENonLinearFilter
780 - NENonLinearFilterKernel
781 - NENonMaximaSuppression3x3
782 - NENonMaximaSuppression3x3FP16Kernel
783 - NENonMaximaSuppression3x3Kernel
784 - NEOpticalFlow
785 - NEPhase
786 - NERemap
787 - NERemapKernel
788 - NEScharr3x3
789 - NEScharr3x3Kernel
790 - NESobel3x3
791 - NESobel3x3Kernel
792 - NESobel5x5
793 - NESobel5x5HorKernel
794 - NESobel5x5VertKernel
795 - NESobel7x7
796 - NESobel7x7HorKernel
797 - NESobel7x7VertKernel
798 - NEThreshold
799 - NEThresholdKernel
800 - NEWarpAffine
801 - NEWarpAffineKernel
802 - NEWarpPerspective
803 - NEWarpPerspectiveKernel
morgolockd6ee9ed2020-11-19 10:07:14 +0000804 - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together):
805 - GCAbsoluteDifference
806 - GCActivationLayer
807 - GCArithmeticAddition
808 - GCBatchNormalizationLayer
809 - GCConcatenateLayer
810 - GCConvolutionLayer
811 - GCDepthwiseConvolutionLayer
812 - GCDirectConvolutionLayer
813 - GCDropoutLayer
814 - GCFillBorder
815 - GCFullyConnectedLayer
816 - GCGEMM
817 - GCGEMMInterleave4x4
818 - GCGEMMTranspose1xW
819 - GCNormalizationLayer
820 - GCNormalizePlanarYUVLayer
821 - GCPixelWiseMultiplication
822 - GCPoolingLayer
823 - GCScale
824 - GCSoftmaxLayer
825 - GCTensorShift
826 - GCTranspose
827
SiCong Li96209c72020-08-21 12:28:30 +0100828
Georgios Pinitas25ef7212020-06-02 23:00:41 +0100829v20.08 Public major release
830 - Various bug fixes.
831 - Various optimisations.
Sheri Zhang3ef9b5f2020-07-09 16:32:58 +0100832 - Added new data type QASYMM8_SIGNED support for:
Sheri Zhangdd4cfc02020-07-10 14:15:41 +0100833 - @ref CLArgMinMaxLayer
834 - @ref CLArgMinMaxLayerKernel
835 - Added new data type U8 support for:
836 - @ref NECropKernel
Sheri Zhang7e20e292021-02-02 11:49:34 +0000837 - CLCropKernel
Jakub Sujakee301b32021-06-04 09:46:08 +0100838 - Added align_corner support for nearest neighbor interpolation in:
Manuel Bottini10b38262021-02-19 18:16:44 +0000839 - NEScaleKernel
Manuel Bottini3b131ab2021-02-19 18:16:44 +0000840 - CLScaleKernel
Sheri Zhangdd4cfc02020-07-10 14:15:41 +0100841 - New OpenCL kernels / functions:
842 - @ref CLMaxUnpoolingLayerKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000843 - New Arm® Neon™ kernels / functions:
Dana Zlotnik149203b2022-01-26 12:38:03 +0200844 - NEMaxUnpoolingLayerKernel
Sheri Zhang3ef9b5f2020-07-09 16:32:58 +0100845 - New graph example:
Sheri Zhangdd4cfc02020-07-10 14:15:41 +0100846 - graph_yolov3_output_detector
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100847 - GEMMTuner improvements:
848 - Added fp16 support
849 - Output json files for easier integration
850 - Enabled tuning for export_to_cl_image_rhs option for RHS tensors
851 - More robust script for running benchmarks
Sheri Zhang3ef9b5f2020-07-09 16:32:58 +0100852 - Removed padding from:
Sheri Zhang1e3ab422021-03-16 17:35:08 +0000853 - NEPixelWiseMultiplicationKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000854 - NEHeightConcatenateLayerKernel
Michalis Spyrou27e67f02021-02-16 11:34:39 +0000855 - NEThresholdKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000856 - NEBatchConcatenateLayerKernel
Teresa Charlind1dc09c2021-03-04 15:24:45 +0000857 - NETransposeKernel
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100858 - @ref NEBatchNormalizationLayerKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000859 - NEArithmeticSubtractionKernel
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100860 - @ref NEBoundingBoxTransformKernel
Michalis Spyrou373b4072021-01-20 16:41:12 +0000861 - NELogits1DMaxKernel
862 - NELogits1DSoftmaxKernel
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100863 - @ref NEROIPoolingLayerKernel
864 - @ref NEROIAlignLayerKernel
Georgios Pinitas0b1c2db2020-12-04 15:51:34 +0000865 - NEYOLOLayerKernel
Georgios Pinitasc53266e2020-12-09 03:11:53 +0000866 - NEUpsampleLayerKernel
Georgios Pinitas70eb53b2021-01-06 19:42:21 +0000867 - NEFloorKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000868 - NEWidthConcatenateLayerKernel
869 - NEDepthConcatenateLayerKernel
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100870 - @ref NENormalizationLayerKernel
871 - @ref NEL2NormalizeLayerKernel
Georgios Pinitasc6f95102021-03-30 10:03:01 +0100872 - NEFillArrayKernel
Georgios Pinitas11d84152021-04-28 10:20:18 +0100873 - NEDepthConvertLayerKernel
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100874 - @ref NERangeKernel
875 - @ref NEPriorBoxLayer
Sheri Zhanged367132020-10-08 15:46:16 +0100876 - Removed OpenCL kernels / functions:
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100877 - CLGEMMLowpQuantizeDownInt32ToUint8Scale
878 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000879 - Removed Arm® Neon™ kernels / functions:
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100880 - NEGEMMLowpQuantizeDownInt32ToUint8Scale
881 - NEGEMMMatrixAccumulateBiasesKernel
SiCong Lid004a7a2020-05-28 15:26:41 +0100882 - Deprecated functions / interfaces:
Michalis Spyrou473cb012021-02-23 11:48:12 +0000883 - Non-descriptor based interfaces for NEThreshold, CLThreshold
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +0000884 - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale
885 - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer :
886 The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0.
morgolock9c7fed82020-08-05 12:30:56 +0100887 Only axis 0 is supported.
888 The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0.
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100889 Only axis 0 is supported.
Sang-Hoon Parka0205b92020-07-07 09:36:09 +0100890 - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity.
Manuel Bottinid844c082021-07-14 12:58:54 +0100891 - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only)
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100892 - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output.
893 - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation.
Georgios Pinitas856f66e2021-04-22 21:13:21 +0100894 - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding.
895 - Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel.
Sang-Hoon Parkadfaefb2020-08-18 09:13:05 +0100896 - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object.
Georgios Pinitas856f66e2021-04-22 21:13:21 +0100897 - The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel.
898 - The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer.
Georgios Pinitas25ef7212020-06-02 23:00:41 +0100899
Georgios Pinitasfd7780d2020-03-17 11:41:00 +0000900v20.05 Public major release
Georgios Pinitasc7b183a2020-03-06 18:12:09 +0000901 - Various bug fixes.
902 - Various optimisations.
Michele Di Giorgio36a551f2020-04-23 11:55:29 +0100903 - Updated recommended NDK version to r18b.
904 - Updated recommended gcc version to Linaro 6.3.1.
Georgios Pinitasc7b183a2020-03-06 18:12:09 +0000905 - Added Bfloat16 type support
906 - Added Bfloat16 support in:
Manuel Bottini29599d02021-07-06 15:01:35 +0100907 - NEWeightsReshapeKernel
908 - NEConvolutionLayerReshapeWeights
Manuel Bottini90028992021-06-30 18:29:18 +0100909 - NEIm2ColKernel
Georgios Pinitasf7c5a412020-12-03 14:38:33 +0000910 - NEIm2Col
Georgios Pinitas11d84152021-04-28 10:20:18 +0100911 - NEDepthConvertLayerKernel
Georgios Pinitasc7b183a2020-03-06 18:12:09 +0000912 - @ref NEDepthConvertLayer
913 - @ref NEGEMMConvolutionLayer
Georgios Pinitasec2256b2020-12-03 18:51:58 +0000914 - NEGEMMAssemblyDispatch
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000915 - Added new data type QASYMM8_SIGNED support for:
916 - @ref CLDirectConvolutionLayer
917 - @ref CLDeconvolutionLayer
918 - @ref CLDirectDeconvolutionLayer
919 - @ref CLGEMMDeconvolutionLayer
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100920 - CLGEMMLowpMatrixMultiplyReshapedKernel
921 - CLGEMMLowpQuantizeDownInt32ScaleKernel
922 - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000923 - @ref CLReductionOperation
924 - @ref CLReduceMean
Sheri Zhang359c48e2020-04-30 22:53:39 +0100925 - @ref NEScale
Manuel Bottini10b38262021-02-19 18:16:44 +0000926 - NEScaleKernel
Georgios Pinitasc53266e2020-12-09 03:11:53 +0000927 - NEUpsampleLayer
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000928 - @ref NECast
929 - @ref NEReductionOperation
930 - @ref NEReduceMean
931 - @ref NEArgMinMaxLayer
932 - @ref NEDeconvolutionLayer
Manuel Bottiniae58bdf2021-06-17 17:18:45 +0100933 - NEGEMMLowpQuantizeDownInt32ScaleKernel
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000934 - @ref CPPBoxWithNonMaximaSuppressionLimit
935 - @ref CPPDetectionPostProcessLayer
936 - @ref CPPPermuteKernel
937 - @ref CPPPermute
938 - @ref CPPTopKVKernel
939 - @ref CPPTopKV
Sheri Zhang359c48e2020-04-30 22:53:39 +0100940 - @ref CPPUpsample
941 - @ref CPPUpsampleKernel
Sheri Zhang31b49ca2020-04-24 11:15:10 +0100942 - New OpenCL kernels / functions:
943 - @ref CLQLSTMLayer
944 - @ref CLQLSTMLayerNormalizationKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000945 - New Arm® Neon™ kernels / functions:
Sheri Zhang31b49ca2020-04-24 11:15:10 +0100946 - @ref NEQLSTMLayer
947 - @ref NEQLSTMLayerNormalizationKernel
948 - Added HARD_SWISH support in:
Georgios Pinitasf47f7182021-01-15 09:29:50 +0000949 - CLActivationLayerKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +0000950 - NEActivationLayerKernel
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000951 - Deprecated OpenCL kernels / functions:
952 - CLGEMMLowpQuantizeDownInt32ToUint8Scale
953 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000954 - Deprecated Arm® Neon™ kernels / functions:
Sheri Zhang0f2522b2020-03-25 16:38:19 +0000955 - NEGEMMLowpQuantizeDownInt32ToUint8Scale
956 - Removed CPP kernels / functions:
957 - CPPFlipWeightsKernel
Manuel Bottini387259a2020-05-21 17:14:36 +0100958 - Removed PoolingLayerInfo constructors without Data Layout.
959 - Removed CLDepthwiseConvolutionLayer3x3
960 - Removed NEDepthwiseConvolutionLayerOptimized
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000961 - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16:
Manuel Bottini075253a2020-05-22 12:57:18 +0100962 - @ref NEWinogradConvolutionLayer
Michalis Spyrou96f977e2021-07-01 12:20:56 +0100963 - CpuWinogradConv2dTransformInputKernel
964 - CpuWinogradConv2dTransformOutputKernel
965 - CpuWinogradConv2dTransformWeightsKernel
Manuel Bottini075253a2020-05-22 12:57:18 +0100966 - Added CLCompileContext
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +0000967 - Added Arm® Neon™ GEMM kernel with 2D window support
Georgios Pinitasc7b183a2020-03-06 18:12:09 +0000968
Michele Di Giorgio740872e2020-03-04 15:29:49 +0000969v20.02.1 Maintenance release
970 - Added Android-NN build script.
971
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000972v20.02 Public major release
973 - Various bug fixes.
974 - Various optimisations.
975 - Added new data type QASYMM8_SIGNED support for:
976 - @ref CLDepthwiseConvolutionLayer
Manuel Bottini387259a2020-05-21 17:14:36 +0100977 - CLDepthwiseConvolutionLayer3x3
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000978 - @ref CLGEMMConvolutionLayer
Georgios Pinitas4a578b92021-06-25 12:13:49 +0100979 - CLGEMMLowpMatrixMultiplyCore
980 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
981 - CLGEMMLowpMatrixMultiplyNativeKernel
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000982 - @ref NEActivationLayer
Sang-Hoon Park63001ac2021-01-18 14:20:27 +0000983 - NEComparisonOperationKernel
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000984 - @ref NEConvolutionLayer
985 - @ref NEDepthwiseConvolutionLayer
Georgios Pinitas7d0adc62020-09-04 15:25:24 +0100986 - NEDepthwiseConvolutionLayer3x3Kernel
Manuel Bottini327225d2021-04-13 13:09:30 +0100987 - NEDirectConvolutionLayerOutputStageKernel
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000988 - @ref NEElementwiseComparison
989 - @ref NEElementwiseMax
990 - @ref NEElementwiseMin
991 - @ref NEElementwiseSquaredDiff
992 - @ref NEFullyConnectedLayer
Michele Di Giorgiof22f6722020-07-03 16:29:24 +0100993 - NEGEMMMatrixVectorMultiplyKernel
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000994 - @ref NEPixelWiseMultiplication
995 - @ref NEPoolingLayer
996 - @ref NEPReluLayer
997 - Added support for QSYMM8_PER_CHANNEL in:
Georgios Pinitas7d0adc62020-09-04 15:25:24 +0100998 - NEDepthwiseConvolutionLayer3x3Kernel
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +0000999 - Added support for split sizes in:
1000 - @ref CLSplit
1001 - @ref NESplit
1002 - New OpenCL kernels / functions:
1003 - @ref CLFill
Georgios Pinitas4a578b92021-06-25 12:13:49 +01001004 - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001005 - New Arm® Neon™ kernels / functions:
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +00001006 - @ref NEFill
Manuel Bottiniae58bdf2021-06-17 17:18:45 +01001007 - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001008 - Deprecated Arm® Neon™ functions / interfaces:
Manuel Bottini387259a2020-05-21 17:14:36 +01001009 - CLDepthwiseConvolutionLayer3x3
1010 - NEDepthwiseConvolutionLayerOptimized
1011 - PoolingLayerInfo constructors without Data Layout.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001012 - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL.
Giuseppe Rossinif04ddbc2020-02-17 17:22:49 +00001013 - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer.
1014 - Added the ability to build bootcode for bare metal.
1015 - Added support for generating synthetic QASYMM8 graphs.
1016 - Added support for F16 datatype in VGG16.
1017 - Removed pre-built binaries for GLES.
1018
Michele Di Giorgiod374ff22020-01-21 10:03:20 +00001019v19.11.1 Public maintenance release
1020 - Fix offset calculation in NEReductionOperationKernel.
1021 - Fix data layout in NEScaleKernel for nhwc.
1022 - Retain configuration step data layout to avoid side-effects.
1023 - Perform sqrt in double domain for L2 pooling.
1024 - Fix output shape calculation for Reduce Mean
1025 - Restrict cases where optimized NEPadLayer runs.
1026
Michele Di Giorgioa046e162019-10-08 09:36:26 +01001027v19.11 Public major release
SiCong Lica1f98c2019-11-28 11:06:11 +00001028 - Various bug fixes.
1029 - Various optimisations.
SiCong Li1f7f9882019-11-28 14:59:35 +00001030 - Updated recommended NDK version to r17c.
SiCong Lica1f98c2019-11-28 11:06:11 +00001031 - Deprecated OpenCL kernels / functions:
Michele Di Giorgioa046e162019-10-08 09:36:26 +01001032 - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel
1033 - CLDepthwiseIm2ColKernel
SiCong Lica1f98c2019-11-28 11:06:11 +00001034 - CLDepthwiseSeparableConvolutionLayer
Michele Di Giorgioa046e162019-10-08 09:36:26 +01001035 - CLDepthwiseVectorToTensorKernel
1036 - CLDirectConvolutionLayerOutputStageKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001037 - Deprecated Arm® Neon™ kernels / functions:
Giorgio Arenad93e2632019-10-15 11:09:33 +01001038 - NEDepthwiseWeightsReshapeKernel
1039 - NEDepthwiseIm2ColKernel
SiCong Lica1f98c2019-11-28 11:06:11 +00001040 - NEDepthwiseSeparableConvolutionLayer
Giorgio Arenad93e2632019-10-15 11:09:33 +01001041 - NEDepthwiseVectorToTensorKernel
Manuel Bottini05069f02019-09-26 17:18:26 +01001042 - NEDepthwiseConvolutionLayer3x3
SiCong Lica1f98c2019-11-28 11:06:11 +00001043 - New OpenCL kernels / functions:
1044 - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer
1045 - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated
1046 OpenCL kernels / functions)
1047 - @ref CLLogSoftmaxLayer
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001048 - New Arm® Neon™ kernels / functions:
SiCong Lica1f98c2019-11-28 11:06:11 +00001049 - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform
Georgios Pinitas8c3c0e72020-12-03 20:11:53 +00001050 - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors
SiCong Lica1f98c2019-11-28 11:06:11 +00001051 - @ref NEDetectionPostProcessLayer
1052 - @ref NEGenerateProposalsLayer
1053 - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer
1054 - @ref NELogSoftmaxLayer
1055 - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer
1056 - Added QASYMM8 support for:
1057 - @ref CLGenerateProposalsLayer
1058 - @ref CLROIAlignLayer
1059 - @ref CPPBoxWithNonMaximaSuppressionLimit
1060 - Added QASYMM16 support for:
1061 - @ref CLBoundingBoxTransform
1062 - Added FP16 support for:
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001063 - CLGEMMMatrixMultiplyReshapedKernel
SiCong Lica1f98c2019-11-28 11:06:11 +00001064 - Added new data type QASYMM8_PER_CHANNEL support for:
Manuel Bottini9e73c932021-03-02 17:40:42 +00001065 - CLDequantizationLayer
SiCong Lica1f98c2019-11-28 11:06:11 +00001066 - @ref NEDequantizationLayer
1067 - Added new data type QSYMM8_PER_CHANNEL support for:
1068 - @ref CLConvolutionLayer
1069 - @ref NEConvolutionLayer
1070 - @ref CLDepthwiseConvolutionLayer
1071 - @ref NEDepthwiseConvolutionLayer
1072 - Added FP16 mixed-precision support for:
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001073 - CLGEMMMatrixMultiplyReshapedKernel
Michele Di Giorgioe1314662021-02-01 17:09:32 +00001074 - CLPoolingLayerKernel
SiCong Lica1f98c2019-11-28 11:06:11 +00001075 - Added FP32 and FP16 ELU activation for:
1076 - @ref CLActivationLayer
1077 - @ref NEActivationLayer
1078 - Added asymmetric padding support for:
1079 - @ref CLDirectDeconvolutionLayer
1080 - @ref CLGEMMDeconvolutionLayer
1081 - @ref NEDeconvolutionLayer
1082 - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer.
Georgios Pinitas0f7ef8a2021-01-10 04:23:52 +00001083 - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer.
1084 - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer.
SiCong Lica1f98c2019-11-28 11:06:11 +00001085 - Improved performance for CL Inception V3 - FP16.
1086 - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision).
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001087 - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer.
1088 - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance.
SiCong Lica1f98c2019-11-28 11:06:11 +00001089 - Optimized @ref CLPadLayer.
1090 - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel.
1091 - Reduced memory consumption by implementing weights sharing.
Michele Di Giorgioa046e162019-10-08 09:36:26 +01001092
Michele Di Giorgiod374ff22020-01-21 10:03:20 +00001093v19.08.1 Public maintenance release
1094 - Fix offset calculation in NEReductionOperationKernel.
1095 - Fix data layout in NEScaleKernel for nhwc.
1096 - Retain configuration step data layout to avoid side-effects.
1097 - Perform sqrt in double domain for L2 pooling.
1098 - Fix output shape calculation for Reduce Mean
1099 - Fix broadcast CLPixelwiseMultiplication with 5D tensors
1100
Georgios Pinitas3d13af82019-06-04 13:04:16 +01001101v19.08 Public major release
1102 - Various bug fixes.
1103 - Various optimisations.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001104 - Deprecated Arm® Neon™ functions
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001105 - NEDepthConcatenateLayer
1106 - NEWidthConcatenateLayer
1107 - Deprecated OpenCL kernels / functions
1108 - CLDepthConcatenateLayer
1109 - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4
1110 - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW
1111 - CLWidthConcatenateLayer
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001112 - New Arm® Neon™ kernels / functions:
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001113 - @ref NEAbsLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001114 - @ref NECast
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001115 - @ref NEElementwisePower
1116 - @ref NELogLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001117 - @ref NELSTMLayerQuantized
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001118 - @ref NENegLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001119 - @ref NEPReluLayer
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001120 - @ref NESinLayer
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001121 - NEBatchConcatenateLayerKernel
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001122 - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer
Michalis Spyrou60c3b0e2021-04-08 12:02:58 +01001123 - NEDepthwiseConvolutionLayerNativeKernel
Manuel Bottiniae58bdf2021-06-17 17:18:45 +01001124 - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001125 - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer
1126 - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer
1127 - New OpenCL kernels / functions:
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001128 - @ref CLAbsLayer
1129 - @ref CLElementwisePower
1130 - @ref CLLogLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001131 - @ref CLLSTMLayerQuantized
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001132 - @ref CLNegLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001133 - @ref CLPReluLayer
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001134 - @ref CLSinLayer
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +00001135 - CLBatchConcatenateLayerKernel
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001136 - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001137 - CLGEMMLowpMatrixMultiplyNativeKernel
Michele Di Giorgioba14c922020-10-12 13:27:57 +01001138 - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001139 - CLGEMMMatrixMultiplyNativeKernel
Michalis Spyrou473cb012021-02-23 11:48:12 +00001140 - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001141 - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer
1142 - New examples:
1143 - neon_opticalflow
1144 - cl_cache
1145 - neon_permute
Gian Marco Iodicec5f48ad2019-09-02 09:52:12 +01001146 - Added support for FP16 in @ref NEDeconvolutionLayer
1147 - Added support for FP16 in @ref CLDeconvolutionLayer
1148 - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001149 - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only)
1150 - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only)
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001151 - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases
Jakub Sujakee301b32021-06-04 09:46:08 +01001152 - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon™ only)
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001153 - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file
1154 - Altered @ref QuantizationInfo interface to support per-channel quantization.
Manuel Bottini387259a2020-05-21 17:14:36 +01001155 - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations.
1156 - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations.
Gian Marco Iodicecc2f54b2019-08-22 10:10:52 +01001157 - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface
1158 - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001159 - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel
Georgios Pinitas3d13af82019-06-04 13:04:16 +01001160
Michalis Spyroua9c44722019-04-05 17:18:36 +01001161v19.05 Public major release
Michalis Spyrouc6608ac2019-05-16 17:40:23 +01001162 - Various bug fixes.
1163 - Various optimisations.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001164 - New Arm® Neon™ kernels / functions:
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001165 - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer
Sheri Zhang1e3ab422021-03-16 17:35:08 +00001166 - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001167 - @ref NECropKernel / @ref NECropResize
Michalis Spyrou60c3b0e2021-04-08 12:02:58 +01001168 - NEDepthwiseConvolutionAssemblyDispatch
Michalis Spyrouca82e622019-05-10 16:43:20 +01001169 - @ref NEFFTDigitReverseKernel
1170 - @ref NEFFTRadixStageKernel
1171 - @ref NEFFTScaleKernel
Manuel Bottinicfac51c2021-06-18 15:47:28 +01001172 - NEGEMMLowpOffsetContributionOutputStageKernel
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001173 - NEHeightConcatenateLayerKernel
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001174 - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer
Michalis Spyroud7dd15c2019-05-30 14:53:58 +01001175 - @ref NEFFT1D
1176 - @ref NEFFT2D
1177 - @ref NEFFTConvolutionLayer
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001178 - New OpenCL kernels / functions:
Sheri Zhangf9ab9f92021-03-16 12:09:15 +00001179 - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication
Sheri Zhang7e20e292021-02-02 11:49:34 +00001180 - CLCropKernel / @ref CLCropResize
Michalis Spyroud7dd15c2019-05-30 14:53:58 +01001181 - @ref CLDeconvolutionReshapeOutputKernel
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001182 - @ref CLFFTDigitReverseKernel
1183 - @ref CLFFTRadixStageKernel
1184 - @ref CLFFTScaleKernel
Georgios Pinitas4a578b92021-06-25 12:13:49 +01001185 - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001186 - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +00001187 - CLHeightConcatenateLayerKernel
Georgios Pinitasf790fdb2019-04-24 12:41:25 +01001188 - @ref CLDirectDeconvolutionLayer
1189 - @ref CLFFT1D
1190 - @ref CLFFT2D
1191 - @ref CLFFTConvolutionLayer
Michalis Spyrouca82e622019-05-10 16:43:20 +01001192 - @ref CLGEMMDeconvolutionLayer
1193 - New OpenGLES kernels / functions:
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +00001194 - GCConcatenateLayer
Michalis Spyroua9c44722019-04-05 17:18:36 +01001195 - Deprecated functions/interfaces
Georgios Pinitas09f24972019-05-17 18:14:40 +01001196 - GCDepthConcatenateLayer
1197 - NEWidthConcatenateLayer
1198 - NEDepthConcatenateLayer
1199 - CLWidthConcatenateLayer
1200 - CLDepthConcatenateLayer
Gian Marco Iodice5fc07aa2019-05-15 17:08:02 +01001201 - CLGEMMInterleave4x4
1202 - CLGEMMTranspose1xW
Michalis Spyrouc6608ac2019-05-16 17:40:23 +01001203 - Support different quantization info in CLConcatLayer.
1204 - Add checks on different input/output quantization info were not supported.
1205 - Tensors have different quantization information.
1206 - Add FP16 support checks.
1207 - Fix output quantization CLDeptwiseConv3x3 when activation is fused.
1208 - New graph examples:
1209 - graph_convolution
1210 - graph_fully_connected
1211 - graph_depthwise_convolution
1212 - Deepspeech v0.4.1
1213 - Add support for QASYMM8 in NEArithmeticSubtractionKernel.
1214 - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel.
1215 - Add support for QASYMM8 NEDeconvolution.
Sheri Zhangac6499a2021-02-10 15:32:38 +00001216 - Add support for DequantizationLayer for Neon/CL.
Michalis Spyrouc6608ac2019-05-16 17:40:23 +01001217 - Add support for dilation in CLDepthwiseConvolution.
1218 - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore.
1219 - Optimize CLDeconvolution.
1220 - Add StackLayer to the graph API.
1221 - Add support for "reflect" padding mode in NEPad.
1222 - Winograd 7x7 NHWC on OpenCL.
1223 - Rework CL ML layers to run exclusively on CL.
1224 - Support different quantization info in PoolingLayer.
1225 - Implement and test import memory interfaces.
1226 - Added new tests and removed old ones.
1227 - Various clang-tidy fixes.
Michalis Spyroua9c44722019-04-05 17:18:36 +01001228
giuros01a69a88b2019-01-31 16:29:19 +00001229v19.02 Public major release
Isabella Gottardi62538972019-02-12 19:52:44 +00001230 - Various bug fixes.
1231 - Various optimisations.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001232 - New Arm® Neon™ kernels / functions:
Isabella Gottardi62538972019-02-12 19:52:44 +00001233 - @ref NETileKernel / @ref NETile
1234 - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization
Sang-Hoon Park63001ac2021-01-18 14:20:27 +00001235 - NEElementwiseOperationKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001236 - @ref NEElementwiseMax
1237 - @ref NEElementwiseMin
1238 - @ref NEElementwiseSquaredDiff
1239 - @ref NESelectKernel / @ref NESelect
1240 - @ref NESplit
1241 - @ref NESlice
1242 - @ref NEUnstack
1243 - @ref NEStridedSliceKernel / @ref NEStridedSlice
Sang-Hoon Park7249f152021-01-22 11:55:03 +00001244 - NEElementwiseUnaryKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001245 - @ref NERsqrtLayer
1246 - @ref NEExpLayer
1247 - @ref NEReverseKernel / @ref NEReverse
1248 - @ref NEArgMinMaxLayer
1249 - @ref NEStackLayerKernel / @ref NEStackLayer
1250 - @ref NERangeKernel / @ref NERange
1251 - @ref NEPadLayer
Georgios Pinitas0f7ef8a2021-01-10 04:23:52 +00001252 - NEMemsetKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001253 - @ref NEGatherKernel / @ref NEGather
1254 - @ref NEElementwiseComparison
1255 - @ref NEElementwiseComparisonStatic
Sang-Hoon Park63001ac2021-01-18 14:20:27 +00001256 - NEComparisonOperationKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001257 - @ref NEElementwiseDivision
1258 - New OpenCL kernels / functions:
1259 - @ref CLSelectKernel / @ref CLSelect
1260 - @ref CLTileKernel / @ref CLTile
1261 - @ref CLComparisonKernel / @ref CLComparison
1262 - @ref CLArgMinMaxLayer
1263 - @ref CLElementwiseMax
1264 - @ref CLElementwiseMin
1265 - @ref CLElementwiseSquaredDiff
1266 - @ref CLStackLayerKernel / @ref CLStackLayer
1267 - @ref CLReverse / @ref CLReverseKernel
1268 - @ref CLRsqrtLayer
1269 - @ref CLExpLayer
Michele Di Giorgioc9c89052021-01-26 10:20:17 +00001270 - CLElementWiseUnaryLayerKernel
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001271 - CLGEMMReshapeLHSMatrixKernel
1272 - CLGEMMReshapeRHSMatrixKernel
1273 - CLGEMMMatrixMultiplyReshapedKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001274 - @ref CLRangeKernel / @ref CLRange
1275 - @ref CLUnstack
1276 - @ref CLGatherKernel / @ref CLGather
Georgios Pinitas4a578b92021-06-25 12:13:49 +01001277 - CLGEMMLowpMatrixMultiplyReshapedKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001278 - New CPP kernels / functions:
1279 - @ref CPPDetectionOutputLayer
1280 - @ref CPPTopKV / @ref CPPTopKVKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001281 - Added new examples:
1282 - graph_ssd_mobilenet.cpp
1283 - graph_mobilenet_v2.cpp
1284 - graph_resnet12.cpp
1285 - graph_srcnn955.cpp
1286 - graph_vgg_vdsr.cpp
1287 - graph_inception_resnet_v1.cpp
1288 - Add 4D tensors support to
1289 - @ref NESoftmaxLayer
1290 - Fused activation in @ref CLWinogradConvolutionLayer
Jakub Sujakee301b32021-06-04 09:46:08 +01001291 - Extended @ref NEPermute to support more cases
1292 - Added Neon™/SVE GEMM Hybrid kernels
Isabella Gottardi62538972019-02-12 19:52:44 +00001293 - Added u8 and s8 hybrid assembly kernels
1294 - Introduced GEMM strategy name in NEGEMMAssemblyWrapper
1295 - Improved @ref CLTuner
1296 - Fused the bias addition within @ref CLGEMM
1297 - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer
1298 - Added NHWC data layout support to:
1299 - @ref NEScale for F16
1300 - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16
1301 - @ref NEL2NormalizeLayer for FP32/FP16
1302 - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16
1303 - @ref CLROIAlignLayer
Manuel Bottini5209be52019-02-13 16:34:56 +00001304 - @ref CLGenerateProposalsLayer
Isabella Gottardi62538972019-02-12 19:52:44 +00001305 - Added QASYMM8 support to the following kernels:
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001306 - NEArithmeticAdditionKernel
Isabella Gottardi62538972019-02-12 19:52:44 +00001307 - @ref NEScale
1308 - Added new tests and improved validation and benchmarking suites.
giuros01a69a88b2019-01-31 16:29:19 +00001309 - Deprecated functions/interfaces
1310 - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer
1311
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001312v18.11 Public major release
1313 - Various bug fixes.
1314 - Various optimisations.
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001315 - New Arm® Neon™ kernels / functions:
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001316 - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel
1317 - @ref NEReduceMean
1318 - @ref NEReorgLayer / @ref NEReorgLayerKernel
1319 - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel
Georgios Pinitasc53266e2020-12-09 03:11:53 +00001320 - NEUpsampleLayer / NEUpsampleLayerKernel
Georgios Pinitas0b1c2db2020-12-04 15:51:34 +00001321 - NEYOLOLayer / NEYOLOLayerKernel
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001322 - New OpenCL kernels / functions:
1323 - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel
1324 - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel
Manuel Bottini5209be52019-02-13 16:34:56 +00001325 - @ref CLComputeAllAnchorsKernel
1326 - @ref CLGenerateProposalsLayer
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001327 - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel
1328 - @ref CLReorgLayer / @ref CLReorgLayerKernel
1329 - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel
1330 - @ref CLPadLayer
1331 - @ref CLReduceMean
1332 - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel
1333 - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel
1334 - @ref CLSlice
1335 - @ref CLSplit
1336 - @ref CLStridedSlice / @ref CLStridedSliceKernel
Georgios Pinitasc53266e2020-12-09 03:11:53 +00001337 - CLUpsampleLayer / CLUpsampleLayerKernel
Georgios Pinitas0b1c2db2020-12-04 15:51:34 +00001338 - CLYOLOLayer / CLYOLOLayerKernel
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001339 - New CPP kernels / functions:
1340 - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel
1341 - Added the validate method in:
1342 - @ref NEDepthConvertLayer
1343 - @ref NEFloor / @ref CLFloor
Michele Di Giorgio93b75e02021-06-21 12:00:43 +01001344 - NEGEMMMatrixAdditionKernel
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001345 - @ref NEReshapeLayer / @ref CLReshapeLayer
1346 - @ref CLScale
1347 - Added new examples:
1348 - graph_shufflenet.cpp
1349 - graph_yolov3.cpp
1350 - Added documentation for add a new function or kernel.
1351 - Improved doxygen documentation adding a list of the existing functions.
1352 - Add 4D tensors support to
Georgios Pinitas09f24972019-05-17 18:14:40 +01001353 - CLWidthConcatenateLayer
Georgios Pinitase2696b12020-12-03 20:37:43 +00001354 - CLFlattenLayer
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001355 - @ref CLSoftmaxLayer
Gian Marco Iodice8155c022021-04-16 15:08:59 +01001356 - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001357 - Add SVE support
1358 - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization
Gian Marco Iodice8155c022021-04-16 15:08:59 +01001359 - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001360 - Added NHWC data layout support to:
1361 - @ref CLChannelShuffleLayer
1362 - @ref CLDeconvolutionLayer
1363 - @ref CLL2NormalizeLayer
1364 - Added QASYMM8 support to the following kernels:
Manuel Bottini3b131ab2021-02-19 18:16:44 +00001365 - CLScaleKernel
Georgios Pinitas7d0adc62020-09-04 15:25:24 +01001366 - NEDepthwiseConvolutionLayer3x3Kernel
Sheri Zhangf9ab9f92021-03-16 12:09:15 +00001367 - CLPixelWiseMultiplicationKernel
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001368 - Added FP16 support to the following kernels:
Gian Marco Iodice8155c022021-04-16 15:08:59 +01001369 - CLDepthwiseConvolutionLayer3x3NHWCKernel
Georgios Pinitas7d0adc62020-09-04 15:25:24 +01001370 - NEDepthwiseConvolutionLayer3x3Kernel
Isabella Gottardi8773d7c2018-11-20 09:56:46 +00001371 - @ref CLNormalizePlanarYUVLayerKernel
1372 - @ref CLWinogradConvolutionLayer (5x5 kernel)
1373 - More tests added to both validation and benchmarking suites.
1374
Anthony Barbierd51ea0a2018-08-07 17:48:03 +01001375v18.08 Public major release
1376 - Various bug fixes.
Michele Di Giorgio02baf012018-08-20 18:10:38 +01001377 - Various optimisations.
Anthony Barbierd51ea0a2018-08-07 17:48:03 +01001378 - Updated recommended NDK version to r17b.
Michele Di Giorgio02baf012018-08-20 18:10:38 +01001379 - Removed support for QS8/QS16 data types.
1380 - Added support for grouped convolution in @ref CLConvolutionLayer.
1381 - Added NHWC data layout support to:
Georgios Pinitas09f24972019-05-17 18:14:40 +01001382 - NEDepthConcatenateLayer / CLDepthConcatenateLayer
Michele Di Giorgio02baf012018-08-20 18:10:38 +01001383 - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer
1384 - @ref CLDepthwiseConvolutionLayer
1385 - @ref CLDirectConvolutionLayer
1386 - @ref CLConvolutionLayer
1387 - @ref CLScale
Manuel Bottinid844c082021-07-14 12:58:54 +01001388 - CLIm2ColKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001389 - New Arm® Neon™ kernels / functions:
Michele Di Giorgio02baf012018-08-20 18:10:38 +01001390 - @ref NERNNLayer
1391 - New OpenCL kernels / functions:
1392 - @ref CLArithmeticDivision
1393 - Introduced prepare() stage support in the graph API for GLES.
1394 - Added support for memory reusage when trying to allocate smaller CLTensors.
1395 - Enabled NHWC execution on graph examples.
1396 - Added JPEG accessor for validation purposes.
1397 - Added validate methods to some kernels / functions.
Anthony Barbierd51ea0a2018-08-07 17:48:03 +01001398
1399v18.05 Public major release
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001400 - Various bug fixes.
1401 - Various optimisations.
Jakub Sujakee301b32021-06-04 09:46:08 +01001402 - Major redesign in the interface for the Neon™ kernels implemented in assembly.
Pablo Telloeb82fd22018-02-23 13:43:50 +00001403 - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel
Jakub Sujakee301b32021-06-04 09:46:08 +01001404 - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in Neon™ functions.
Pablo Telloeb82fd22018-02-23 13:43:50 +00001405 - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface.
Jakub Sujakee301b32021-06-04 09:46:08 +01001406 - Moved Neon™ assembly kernels to the folder src/core/Neon/kernels/arm_gemm.
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001407 - Improved doxygen documentation.
1408 - Improved memory management for layer's transitions.
1409 - Added support for NHWC data layout in tensors.
1410 - Added NHWC data layout support to:
1411 - @ref NEGEMMConvolutionLayer
1412 - @ref NEDirectConvolutionLayer
1413 - @ref NEPoolingLayer / @ref CLPoolingLayer
1414 - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer
1415 - @ref NEDepthwiseConvolutionLayer
1416 - @ref NEScale
Georgios Pinitasf7c5a412020-12-03 14:38:33 +00001417 - NEIm2Col
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001418 - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer.
1419 - New OpenCL kernels / functions:
1420 - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel
Teresa Charlin91b7f742021-04-12 13:57:00 +01001421 - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights
Sheri Zhang7e20e292021-02-02 11:49:34 +00001422 - @ref CLCopy / CLCopyKernel
Anthony Barbier38e7f1f2018-05-21 13:37:47 +01001423 - @ref CLLSTMLayer
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001424 - @ref CLRNNLayer
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +00001425 - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel
Manuel Bottinic6f4ec32021-05-18 18:41:56 +01001426 - CLWinogradFilterTransformKernel / @ref CLWinogradConvolutionLayer
1427 - CLWinogradInputTransformKernel / CLWinogradInputTransform
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001428 - New Arm® Neon™ kernels / functions:
Teresa Charlin562bee52021-04-13 17:44:15 +01001429 - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights.
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001430 - Created the validate method in @ref CLDepthwiseConvolutionLayer.
1431 - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer.
1432 - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer.
Sheri Zhang1e3ab422021-03-16 17:35:08 +00001433 - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel.
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001434 - Port mobilenet example to NHWC data layout.
1435 - Enabled Winograd method in @ref CLConvolutionLayer.
1436 - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer.
Sheri Zhangac6499a2021-02-10 15:32:38 +00001437 - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm.
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001438 - Added memory manager support in GLES functions.
1439 - Major refactoring of the graph API.
1440 - Added GLES backend in the graph API.
1441 - Added support for the memory manager in the graph API.
1442 - Enabled Winograd Convolution method in the graph API.
1443 - Added support for grouped convolutions in the graph API.
Manuel Bottini10b38262021-02-19 18:16:44 +00001444 - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer.
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001445 - Added fast maths flag in @ref CLConvolutionLayer.
1446 - Added new tests and benchmarks in validation and benchmark frameworks
Jakub Sujakee301b32021-06-04 09:46:08 +01001447 - Merge Activation layer with Convolution Layer (Neon™, CL, GLES)
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001448 - Added support to OpenCL 2.0 SVM
1449 - Added support to import memory in OpenCL tensors.
1450 - Added the prepare() method to perform any one off pre-processing before running the function.
1451 - Added new examples:
1452 - graph_inception_v4.cpp
Anthony Barbier38e7f1f2018-05-21 13:37:47 +01001453 - graph_resnext50.cpp
Pablo Tellob5cc95b2018-05-15 11:49:33 +01001454 - Added memory measurement instrument for CL.
Pablo Telloeb82fd22018-02-23 13:43:50 +00001455
Anthony Barbier577fbdf2018-03-01 15:17:54 +00001456v18.03 Public maintenance release
1457 - Various bug fixes.
Anthony Barbier3762e742018-03-02 11:49:33 +00001458 - Fixed bug in @ref NEActivationLayer
1459 - Fix in @ref CLTuner when using batches.
Anthony Barbier577fbdf2018-03-01 15:17:54 +00001460 - Updated recommended NDK version to r16b (And fixed warnings).
1461 - Fixed bug in validation code.
1462 - Added Inception v4 graph example.
Georgios Pinitas9fb11592018-04-26 20:34:58 +01001463 - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer
Anthony Barbier577fbdf2018-03-01 15:17:54 +00001464
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001465v18.02 Public major release
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001466 - Various Arm® Neon™ / OpenCL / GLES optimisations.
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001467 - Various bug fixes.
1468 - Changed default number of threads on big LITTLE systems.
1469 - Refactored examples and added:
1470 - graph_mobilenet_qassym8
1471 - graph_resnet
1472 - graph_squeezenet_v1_1
Anthony Barbier3762e742018-03-02 11:49:33 +00001473 - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method.
1474 - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method.
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001475 - Added in place support to:
Anthony Barbier3762e742018-03-02 11:49:33 +00001476 - @ref CLActivationLayer
1477 - @ref CLBatchNormalizationLayer
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001478 - Added QASYMM8 support to:
Anthony Barbier3762e742018-03-02 11:49:33 +00001479 - @ref CLActivationLayer
1480 - @ref CLDepthwiseConvolutionLayer
1481 - @ref NEDepthwiseConvolutionLayer
1482 - @ref NESoftmaxLayer
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001483 - Added FP16 support to:
Manuel Bottini387259a2020-05-21 17:14:36 +01001484 - CLDepthwiseConvolutionLayer3x3
Anthony Barbier3762e742018-03-02 11:49:33 +00001485 - @ref CLDepthwiseConvolutionLayer
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001486 - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication
Anthony Barbier3762e742018-03-02 11:49:33 +00001487 - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer
1488 - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001489 - New OpenCL kernels / functions:
Michele Di Giorgioa046e162019-10-08 09:36:26 +01001490 - CLDirectConvolutionLayerOutputStageKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001491 - New Arm® Neon™ kernels / functions
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001492 - Added name() method to all kernels.
1493 - Added support for Winograd 5x5.
Georgios Pinitas0f7ef8a2021-01-10 04:23:52 +00001494 - NEPermuteKernel / @ref NEPermute
Michalis Spyrou96f977e2021-07-01 12:20:56 +01001495 - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer
1496 - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer
1497 - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer
Anthony Barbiere1553372018-07-16 18:53:52 +01001498 - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel
Anthony Barbier2d0ce772018-02-21 15:35:36 +00001499 - New GLES kernels / functions:
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +00001500 - GCTensorShiftKernel / GCTensorShift
Pablo Tellof6c572c2018-02-14 12:47:30 +00001501
Anthony Barbier64c95a02018-01-22 18:48:55 +00001502v18.01 Public maintenance release
1503 - Various bug fixes
1504 - Added some of the missing validate() methods
Anthony Barbier3762e742018-03-02 11:49:33 +00001505 - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample
Sheri Zhang7e20e292021-02-02 11:49:34 +00001506 - Added CLPermuteKernel / @ref CLPermute
Anthony Barbier64c95a02018-01-22 18:48:55 +00001507 - Added method to clean the programs cache in the CL Kernel library.
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +00001508 - Added GCArithmeticAdditionKernel / GCArithmeticAddition
1509 - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3
1510 - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer
1511 - Added GCScaleKernel / GCScale
1512 - Added GCWeightsReshapeKernel / GCConvolutionLayer
Anthony Barbier64c95a02018-01-22 18:48:55 +00001513 - Added FP16 support to the following GLES compute kernels:
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +00001514 - GCCol2ImKernel
1515 - GCGEMMInterleave4x4Kernel
1516 - GCGEMMTranspose1xWKernel
1517 - GCIm2ColKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001518 - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel)
Manuel Bottini327225d2021-04-13 13:09:30 +01001519 - Added NEDirectConvolutionLayerOutputStageKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001520 - Added QASYMM8 support to the following Arm® Neon™ kernels:
Georgios Pinitas7d0adc62020-09-04 15:25:24 +01001521 - NEDepthwiseConvolutionLayer3x3Kernel
Anthony Barbier3762e742018-03-02 11:49:33 +00001522 - @ref NEFillBorderKernel
Michele Di Giorgio19289042021-02-03 16:05:00 +00001523 - NEPoolingLayerKernel
Anthony Barbier64c95a02018-01-22 18:48:55 +00001524 - Added new examples:
1525 - graph_cl_mobilenet_qasymm8.cpp
1526 - graph_inception_v3.cpp
1527 - gc_dc.cpp
1528 - More tests added to both validation and benchmarking suites.
1529
Gian Marcoff850932017-12-11 12:37:17 +00001530v17.12 Public major release
1531 - Most machine learning functions on OpenCL support the new data type QASYMM8
1532 - Introduced logging interface
1533 - Introduced opencl timer
1534 - Reworked GEMMLowp interface
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001535 - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM
Gian Marcoff850932017-12-11 12:37:17 +00001536 - Added validation method for most Machine Learning kernels / functions
1537 - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19
1538 - Added sgemm example for OpenCL
1539 - Added absolute difference example for GLES compute
1540 - Added new tests and benchmarks in validation and benchmark frameworks
1541 - Added new kernels / functions for GLES compute
1542
1543 - New OpenGL ES kernels / functions
Manuel Bottiniceaa0bf2021-02-16 15:15:19 +00001544 - GCAbsoluteDifferenceKernel / GCAbsoluteDifference
1545 - GCActivationLayerKernel / GCActivationLayer
1546 - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer
1547 - GCCol2ImKernel
1548 - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer
1549 - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer
1550 - GCDropoutLayerKernel / GCDropoutLayer
1551 - GCFillBorderKernel / GCFillBorder
1552 - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4
1553 - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM
1554 - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW
1555 - GCIm2ColKernel
1556 - GCNormalizationLayerKernel / GCNormalizationLayer
1557 - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication
1558 - GCPoolingLayerKernel / GCPoolingLayer
1559 - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer
1560 - GCTransposeKernel / GCTranspose
Gian Marcoff850932017-12-11 12:37:17 +00001561
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001562 - New Arm® Neon™ kernels / functions
Pablo Telloeb82fd22018-02-23 13:43:50 +00001563 - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore
1564 - arm_compute::NEHGEMMAArch64FP16Kernel
Georgios Pinitas7d0adc62020-09-04 15:25:24 +01001565 - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer
Manuel Bottinicfac51c2021-06-18 15:47:28 +01001566 - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore
Manuel Bottiniae58bdf2021-06-17 17:18:45 +01001567 - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Georgios Pinitas9fb11592018-04-26 20:34:58 +01001568 - NEWinogradLayer / NEWinogradLayerKernel
Gian Marcoff850932017-12-11 12:37:17 +00001569
1570 - New OpenCL kernels / functions
Georgios Pinitas4a578b92021-06-25 12:13:49 +01001571 - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore
1572 - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint
Gian Marcoff850932017-12-11 12:37:17 +00001573
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001574 - New graph nodes for Arm® Neon™ and OpenCL
Georgios Pinitasd9eb2752018-04-03 13:44:29 +01001575 - graph::BranchLayer
1576 - graph::DepthConvertLayer
1577 - graph::DepthwiseConvolutionLayer
1578 - graph::DequantizationLayer
1579 - graph::FlattenLayer
1580 - graph::QuantizationLayer
1581 - graph::ReshapeLayer
Gian Marcoff850932017-12-11 12:37:17 +00001582
Anthony Barbier3c5b4ff2017-10-12 13:20:52 +01001583v17.10 Public maintenance release
1584 - Bug fixes:
1585 - Check the maximum local workgroup size supported by OpenCL devices
1586 - Minor documentation updates (Fixed instructions to build the examples)
Anthony Barbier3762e742018-03-02 11:49:33 +00001587 - Introduced a graph::GraphContext
Anthony Barbier3c5b4ff2017-10-12 13:20:52 +01001588 - Added a few new Graph nodes, support for branches and grouping.
1589 - Automatically enable cl_printf in debug builds
1590 - Fixed bare metal builds for armv7a
1591 - Added AlexNet and cartoon effect examples
1592 - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute)
1593
Anthony Barbier6a5627a2017-09-26 14:42:02 +01001594v17.09 Public major release
1595 - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers.
Anthony Barbier3762e742018-03-02 11:49:33 +00001596 - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager)
Anthony Barbier6a5627a2017-09-26 14:42:02 +01001597 - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework).
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001598 - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL.
1599 - New Arm® Neon™ kernels / functions:
Pablo Telloeb82fd22018-02-23 13:43:50 +00001600 - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel
Manuel Bottini00f4dfc2021-03-10 09:55:14 +00001601 - NEDequantizationLayerKernel / @ref NEDequantizationLayer
Georgios Pinitas70eb53b2021-01-06 19:42:21 +00001602 - NEFloorKernel / @ref NEFloor
Anthony Barbier3762e742018-03-02 11:49:33 +00001603 - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer
Georgios Pinitasb6af4822021-09-14 12:33:34 +01001604 - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer
Anthony Barbier3762e742018-03-02 11:49:33 +00001605 - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer
1606 - @ref NEReductionOperationKernel / @ref NEReductionOperation
Georgios Pinitas0f7ef8a2021-01-10 04:23:52 +00001607 - NEReshapeLayerKernel / @ref NEReshapeLayer
Anthony Barbier6a5627a2017-09-26 14:42:02 +01001608
1609 - New OpenCL kernels / functions:
Gian Marco Iodice8155c022021-04-16 15:08:59 +01001610 - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer
Manuel Bottini9e73c932021-03-02 17:40:42 +00001611 - CLDequantizationLayerKernel / CLDequantizationLayer
Sheri Zhang1efed922021-03-10 22:43:38 +00001612 - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer
Georgios Pinitase2696b12020-12-03 20:37:43 +00001613 - CLFlattenLayer
Georgios Pinitasf47f7182021-01-15 09:29:50 +00001614 - CLFloorKernel / @ref CLFloor
Gian Marco Iodice5fc07aa2019-05-15 17:08:02 +01001615 - CLGEMMTranspose1xW
Michele Di Giorgioee82d342021-01-05 16:14:28 +00001616 - CLGEMMMatrixVectorMultiplyKernel
Anthony Barbier3762e742018-03-02 11:49:33 +00001617 - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer
Georgios Pinitasb6af4822021-09-14 12:33:34 +01001618 - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer
Anthony Barbier3762e742018-03-02 11:49:33 +00001619 - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer
1620 - @ref CLReductionOperationKernel / @ref CLReductionOperation
Sheri Zhang7e20e292021-02-02 11:49:34 +00001621 - CLReshapeLayerKernel / @ref CLReshapeLayer
Anthony Barbier6a5627a2017-09-26 14:42:02 +01001622
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001623v17.06 Public major release
1624 - Various bug fixes
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001625 - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels.
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001626 - Added unit tests and benchmarks (AlexNet, LeNet)
1627 - Added support for sub tensors.
1628 - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels.
Sheri Zhangac6499a2021-02-10 15:32:38 +00001629 - Added @ref OMPScheduler (OpenMP) scheduler for Neon
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001630 - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal)
ramelg01b2eba7f2021-12-23 08:32:08 +00001631 - User can specify their own scheduler by implementing the @ref IScheduler interface.
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001632 - New OpenCL kernels / functions:
Anthony Barbier3762e742018-03-02 11:49:33 +00001633 - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer
Michele Di Giorgio7d61ff02021-01-18 21:15:59 +00001634 - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer
Michalis Spyrou473cb012021-02-23 11:48:12 +00001635 - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection
Georgios Pinitas96b16b62020-12-01 17:41:34 +00001636 - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer
Manuel Bottinid87aded2021-07-16 10:23:31 +01001637 - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001638 - New C++ kernels:
Georgios Pinitasc6f95102021-03-30 10:03:01 +01001639 - CPPDetectionWindowNonMaximaSuppressionKernel
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001640 - New Arm® Neon™ kernels / functions:
Anthony Barbier3762e742018-03-02 11:49:33 +00001641 - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001642 - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer
Manuel Bottini327225d2021-04-13 13:09:30 +01001643 - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer
Georgios Pinitas96b16b62020-12-01 17:41:34 +00001644 - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer
Manuel Bottini29599d02021-07-06 15:01:35 +01001645 - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001646
1647v17.05 Public bug fixes release
1648 - Various bug fixes
1649 - Remaining of the functions ported to use accurate padding.
1650 - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).
1651 - Added "free" method to allocator.
1652 - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9
1653
1654v17.04 Public bug fixes release
1655
1656 The following functions have been ported to use the new accurate padding:
Michalis Spyrou473cb012021-02-23 11:48:12 +00001657 - CLColorConvertKernel
1658 - CLEdgeNonMaxSuppressionKernel
1659 - CLEdgeTraceKernel
1660 - CLGaussianPyramidHorKernel
1661 - CLGaussianPyramidVertKernel
1662 - CLGradientKernel
Michalis Spyrou27e67f02021-02-16 11:34:39 +00001663 - NEChannelCombineKernel
Georgios Pinitasc6f95102021-03-30 10:03:01 +01001664 - NEFillArrayKernel
Michalis Spyrou27e67f02021-02-16 11:34:39 +00001665 - NEGaussianPyramidHorKernel
1666 - NEGaussianPyramidVertKernel
Georgios Pinitas09d34512018-08-30 16:02:11 +01001667 - NEHarrisScoreFP16Kernel
Michalis Spyrou27e67f02021-02-16 11:34:39 +00001668 - NEHarrisScoreKernel
1669 - NEHOGDetectorKernel
Michalis Spyrou373b4072021-01-20 16:41:12 +00001670 - NELogits1DMaxKernel
Anthony Barbier3762e742018-03-02 11:49:33 +00001671 - NELogits1DShiftExpSumKernel
1672 - NELogits1DNormKernel
Michalis Spyrou473cb012021-02-23 11:48:12 +00001673 - NENonMaximaSuppression3x3FP16Kernel
1674 - NENonMaximaSuppression3x3Kernel
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001675
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001676v17.03.1 First Major public release of the sources
1677 - Renamed the library to arm_compute
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001678 - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions.
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001679 - New padding calculation interface introduced and ported most kernels / functions to use it.
1680 - New OpenCL kernels / functions:
Gian Marco Iodiceeb65f6d2020-04-15 11:42:15 +01001681 - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001682 - New Arm® Neon™ kernels / functions:
Anthony Barbier3762e742018-03-02 11:49:33 +00001683 - @ref NENormalizationLayerKernel / @ref NENormalizationLayer
Teresa Charlind1dc09c2021-03-04 15:24:45 +00001684 - NETransposeKernel / @ref NETranspose
Michalis Spyrou373b4072021-01-20 16:41:12 +00001685 - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer
Manuel Bottini24b89202021-07-01 18:13:33 +01001686 - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer
Michele Di Giorgiof22f6722020-07-03 16:29:24 +01001687 - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer
Manuel Bottinicfac51c2021-06-18 15:47:28 +01001688 - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001689
1690v17.03 Sources preview
1691 - New OpenCL kernels / functions:
Michalis Spyrou473cb012021-02-23 11:48:12 +00001692 - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge
Georgios Pinitas856f66e2021-04-22 21:13:21 +01001693 - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM
Michele Di Giorgiof6f78762020-07-06 11:27:21 +01001694 - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer
Teresa Charlin27886092021-02-25 20:15:01 +00001695 - CLTransposeKernel / @ref CLTranspose
Georgios Pinitasc6f95102021-03-30 10:03:01 +01001696 - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow
Anthony Barbier3762e742018-03-02 11:49:33 +00001697 - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer
Michalis Spyrou473cb012021-02-23 11:48:12 +00001698 - CLLaplacianPyramid, CLLaplacianReconstruct
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001699 - New Arm® Neon™ kernels / functions:
Michele Di Giorgiobd2c8e12021-01-19 15:29:02 +00001700 - NEActivationLayerKernel / @ref NEActivationLayer
Michele Di Giorgio93b75e02021-06-21 12:00:43 +01001701 - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM
Michele Di Giorgio19289042021-02-03 16:05:00 +00001702 - NEPoolingLayerKernel / @ref NEPoolingLayer
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001703
1704v17.02.1 Sources preview
1705 - New OpenCL kernels / functions:
Sang-Hoon Park201e0fe2021-01-27 13:14:56 +00001706 - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer
Michele Di Giorgioe1314662021-02-01 17:09:32 +00001707 - CLPoolingLayerKernel / @ref CLPoolingLayer
Manuel Bottinid844c082021-07-14 12:58:54 +01001708 - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer
Adnan AlSinan6863fa02022-02-04 13:04:55 +00001709 - CLRemapKernel / CLRemap
Michalis Spyrou473cb012021-02-23 11:48:12 +00001710 - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb
1711 - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation
1712 - CLNonLinearFilterKernel / CLNonLinearFilter
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001713 - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU)
Michalis Spyrou27e67f02021-02-16 11:34:39 +00001714 - NEAccumulateWeightedFP16Kernel
1715 - NEBox3x3FP16Kernel
Michalis Spyrou473cb012021-02-23 11:48:12 +00001716 - NENonMaximaSuppression3x3FP16Kernel
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001717
1718v17.02 Sources preview
1719 - New OpenCL kernels / functions:
Georgios Pinitasf47f7182021-01-15 09:29:50 +00001720 - CLActivationLayerKernel / @ref CLActivationLayer
Michalis Spyrou473cb012021-02-23 11:48:12 +00001721 - CLChannelCombineKernel / CLChannelCombine
1722 - CLDerivativeKernel / CLChannelExtract
1723 - CLFastCornersKernel / CLFastCorners
1724 - CLMeanStdDevKernel / CLMeanStdDev
Michele Di Giorgio33f41fa2021-03-09 14:09:08 +00001725 - New Arm® Neon™ kernels / functions:
Michalis Spyrou27e67f02021-02-16 11:34:39 +00001726 - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection
1727 - NENonLinearFilterKernel / NENonLinearFilter
Anthony Barbier6ff3b192017-09-04 18:44:23 +01001728 - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events.
1729 - Switched all the kernels / functions to use tensors instead of images.
1730 - Updated documentation to include instructions to build the library from sources.
1731
1732v16.12 Binary preview release
1733 - Original release
1734
Sheri Zhangd813bab2021-04-30 16:53:41 +01001735 */
Ramy Elgammal0d274b72022-08-05 13:14:57 +01001736} // namespace arm_compute