Vidhya Sudhan Loganathan | d646ae1 | 2018-11-19 15:18:20 +0000 | [diff] [blame] | 1 | /// |
Gian Marco Iodice | 716b1be | 2021-02-10 17:33:27 +0000 | [diff] [blame] | 2 | /// Copyright (c) 2017-2021 Arm Limited. |
Vidhya Sudhan Loganathan | d646ae1 | 2018-11-19 15:18:20 +0000 | [diff] [blame] | 3 | /// |
| 4 | /// SPDX-License-Identifier: MIT |
| 5 | /// |
| 6 | /// Permission is hereby granted, free of charge, to any person obtaining a copy |
| 7 | /// of this software and associated documentation files (the "Software"), to |
| 8 | /// deal in the Software without restriction, including without limitation the |
| 9 | /// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or |
| 10 | /// sell copies of the Software, and to permit persons to whom the Software is |
| 11 | /// furnished to do so, subject to the following conditions: |
| 12 | /// |
| 13 | /// The above copyright notice and this permission notice shall be included in all |
| 14 | /// copies or substantial portions of the Software. |
| 15 | /// |
| 16 | /// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 17 | /// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 18 | /// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 19 | /// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 20 | /// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 21 | /// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| 22 | /// SOFTWARE. |
| 23 | /// |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 24 | namespace arm_compute |
| 25 | { |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 26 | /** @mainpage Introduction |
| 27 | |
| 28 | @tableofcontents |
| 29 | |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 30 | The Compute Library is a collection of low-level machine learning functions optimized for both Arm CPUs and GPUs using SIMD technologies. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 31 | |
| 32 | Several builds of the library are available using various configurations: |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 33 | - OS: Linux, Android, macOS or bare metal. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 34 | - Architecture: armv7a (32bit) or arm64-v8a (64bit). |
| 35 | - Technology: Neon / OpenCL / Neon and OpenCL. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 36 | - Debug / Asserts / Release: Use a build with asserts enabled to debug your application and enable extra validation. Once you are sure your application works as expected you can switch to a release build of the library for maximum performance. |
| 37 | |
| 38 | @section S0_1_contact Contact / Support |
| 39 | |
Michele Di Giorgio | eca54a0 | 2021-02-16 15:37:59 +0000 | [diff] [blame] | 40 | Please create an issue on <a href="https://github.com/ARM-software/ComputeLibrary/issues">Github</a>. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 41 | |
| 42 | In order to facilitate the work of the support team please provide the build information of the library you are using. To get the version of the library you are using simply run: |
| 43 | |
| 44 | $ strings android-armv7a-cl-asserts/libarm_compute.so | grep arm_compute_version |
| 45 | arm_compute_version=v16.12 Build options: {'embed_kernels': '1', 'opencl': '1', 'arch': 'armv7a', 'neon': '0', 'asserts': '1', 'debug': '0', 'os': 'android', 'Werror': '1'} Git hash=f51a545d4ea12a9059fe4e598a092f1fd06dc858 |
| 46 | |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 47 | @section S0_2_prebuilt_binaries Pre-built binaries |
| 48 | |
| 49 | For each release we provide some pre-built binaries of the library [here](https://github.com/ARM-software/ComputeLibrary/releases) |
| 50 | |
| 51 | These binaries have been built using the following toolchains: |
Giorgio Arena | cd7d178 | 2021-02-22 14:58:37 +0000 | [diff] [blame] | 52 | - Linux armv7a: gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf |
| 53 | - Linux arm64-v8a: gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 54 | - Android armv7a: clang++ / libc++ NDK r18b |
Giorgio Arena | cd7d178 | 2021-02-22 14:58:37 +0000 | [diff] [blame] | 55 | - Android am64-v8a: clang++ / libc++ NDK r20b |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 56 | |
| 57 | @warning Make sure to use a compatible toolchain to build your application or you will get some std::bad_alloc errors at runtime. |
| 58 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 59 | @section S1_file_organisation File organisation |
| 60 | |
| 61 | This archive contains: |
| 62 | - The arm_compute header and source files |
| 63 | - The latest Khronos OpenCL 1.2 C headers from the <a href="https://www.khronos.org/registry/cl/">Khronos OpenCL registry</a> |
| 64 | - The latest Khronos cl2.hpp from the <a href="https://www.khronos.org/registry/cl/">Khronos OpenCL registry</a> (API version 2.1 when this document was written) |
Anthony Barbier | 20dbb82 | 2017-12-13 21:19:39 +0000 | [diff] [blame] | 65 | - The latest Khronos EGL 1.5 C headers from the <a href="https://www.khronos.org/registry/gles/">Khronos EGL registry</a> |
| 66 | - The sources for a stub version of libOpenCL.so, libGLESv1_CM.so, libGLESv2.so and libEGL.so to help you build your application. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 67 | - An examples folder containing a few examples to compile and link against the library. |
| 68 | - A @ref utils folder containing headers with some boiler plate code used by the examples. |
| 69 | - This documentation. |
| 70 | |
Michele Di Giorgio | 552e11d | 2020-09-23 15:08:38 +0100 | [diff] [blame] | 71 | For detailed information about file organization, please refer to Files -> File List section of this documentation. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 72 | |
| 73 | @section S2_versions_changelog Release versions and changelog |
| 74 | |
| 75 | @subsection S2_1_versions Release versions |
| 76 | |
| 77 | All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. |
| 78 | If there is more than one release in a month then an extra sequential number is appended at the end: |
| 79 | |
| 80 | v17.03 (First release of March 2017) |
| 81 | v17.03.1 (Second release of March 2017) |
| 82 | v17.04 (First release of April 2017) |
| 83 | |
| 84 | @note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes. |
| 85 | |
| 86 | @subsection S2_2_changelog Changelog |
| 87 | |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 88 | v21.05 Public major release |
| 89 | - Removed computer vision support from Neon backend |
| 90 | - Removed the following functions: |
| 91 | - NEAbsoluteDifference |
| 92 | - NEAccumulate |
| 93 | - NEBox3x3 |
| 94 | - NECannyEdge |
| 95 | - NEChannelCombine |
| 96 | - NEChannelExtract |
| 97 | - NEColorConvert |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 98 | - NEConvolution |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 99 | - NEDerivative |
| 100 | - NEDilate |
| 101 | - NEEqualizeHistogram |
| 102 | - NEErode |
| 103 | - NEFastCorners |
| 104 | - NEGaussian3x3 |
| 105 | - NEGaussian5x5 |
| 106 | - NEGaussianPyramid |
| 107 | - NEHOGDescriptor |
| 108 | - NEHOGDetector |
| 109 | - NEHOGGradient |
| 110 | - NEHOGMultiDetection |
| 111 | - NEHarrisCorners |
| 112 | - NEHistogram |
| 113 | - NEIntegralImage |
| 114 | - NELaplacianPyramid |
| 115 | - NELaplacianReconstruct |
| 116 | - NEMagnitude |
| 117 | - NEMeanStdDev |
| 118 | - NEMedian3x3 |
| 119 | - NEMinMaxLocation |
| 120 | - NENonLinearFilter |
| 121 | - NEOpticalFlow |
| 122 | - NEPhase |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 123 | - NEScharr3x3 |
| 124 | - NESobel3x3 |
| 125 | - NESobel5x5 |
| 126 | - NESobel7x7 |
| 127 | - NETableLookup |
| 128 | - NEThreshold |
| 129 | - NEWarpAffine |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 130 | - NEWarpPerspectiveKernel |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 131 | |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 132 | - Remove all GLES kernels / functions / tests / examples |
| 133 | - Removed computer vision support from CL backend |
| 134 | - Removed the following functions: |
| 135 | - CLAbsoluteDifference |
| 136 | - CLAccumulate |
| 137 | - CLBox3x3 |
| 138 | - CLCannyEdge |
| 139 | - CLChannelCombine |
| 140 | - CLChannelExtract |
| 141 | - CLColorConvert |
| 142 | - CLConvolution |
| 143 | - CLDerivative |
| 144 | - CLDilate |
| 145 | - CLEqualizeHistogram |
| 146 | - CLErode |
| 147 | - CLFastCorners |
| 148 | - CLGaussian3x3 |
| 149 | - CLGaussian5x5 |
| 150 | - CLGaussianPyramid |
| 151 | - CLHOGDescriptor |
| 152 | - CLHOGDetector |
| 153 | - CLHOGGradient |
| 154 | - CLHOGMultiDetection |
| 155 | - CLHarrisCorners |
| 156 | - CLHistogram |
| 157 | - CLIntegralImage |
| 158 | - CLLaplacianPyramid |
| 159 | - CLLaplacianReconstruct |
| 160 | - CLMagnitude |
| 161 | - CLMeanStdDev |
| 162 | - CLMedian3x3 |
| 163 | - CLMinMaxLocation |
| 164 | - CLNonLinearFilter |
| 165 | - CLOpticalFlow |
| 166 | - CLPhase |
| 167 | - CLScharr3x3 |
| 168 | - CLSobel3x3 |
| 169 | - CLSobel5x5 |
| 170 | - CLSobel7x7 |
| 171 | - CLTableLookup |
| 172 | - CLThreshold |
| 173 | - CLWarpAffine |
| 174 | - CLWarpPerspective |
| 175 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 176 | v21.02 Public major release |
Sheri Zhang | da6a6eb | 2021-01-06 11:15:06 +0000 | [diff] [blame] | 177 | - Various bug fixes. |
| 178 | - Various optimisations. |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 179 | - Upgrade C++ standard to C++14 |
| 180 | - Add macOS support |
Giorgio Arena | 1055dc1 | 2021-02-19 09:53:06 +0000 | [diff] [blame] | 181 | - Add Armv8-R AArch64 architecture support |
Sheri Zhang | da6a6eb | 2021-01-06 11:15:06 +0000 | [diff] [blame] | 182 | - Add SVE/SVE2 support for: |
Manuel Bottini | 10b3826 | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 183 | - NEScaleKernel |
Sheri Zhang | da6a6eb | 2021-01-06 11:15:06 +0000 | [diff] [blame] | 184 | - @ref NEActivationLayer |
| 185 | - @ref NEArithmeticAddition |
| 186 | - @ref NEBatchNormalizationLayerKernel |
Giorgio Arena | 1055dc1 | 2021-02-19 09:53:06 +0000 | [diff] [blame] | 187 | - @ref cpu::kernels::CpuLogits1DSoftmaxKernel |
| 188 | - @ref cpu::kernels::CpuLogits1DMaxKernel |
| 189 | - @ref cpu::kernels::CpuElementwiseUnaryKernel |
Sheri Zhang | dda6914 | 2021-02-01 19:06:57 +0000 | [diff] [blame] | 190 | - Remove padding from OpenCL kernels: |
Sheri Zhang | 1efed92 | 2021-03-10 22:43:38 +0000 | [diff] [blame] | 191 | - CLDirectConvolutionLayerKernel |
Sheri Zhang | dda6914 | 2021-02-01 19:06:57 +0000 | [diff] [blame] | 192 | - @ref CLArgMinMaxLayerKernel |
| 193 | - @ref CLPadLayerKernel |
| 194 | - @ref CLROIAlignLayerKernel |
| 195 | - @ref CLRangeKernel |
Manuel Bottini | 3b131ab | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 196 | - CLScaleKernel |
Sheri Zhang | dda6914 | 2021-02-01 19:06:57 +0000 | [diff] [blame] | 197 | - @ref CLSelectKernel |
| 198 | - @ref CLBitwiseKernel |
Giorgio Arena | 1055dc1 | 2021-02-19 09:53:06 +0000 | [diff] [blame] | 199 | - @ref opencl::kernels::ClFloorKernel |
Sheri Zhang | dda6914 | 2021-02-01 19:06:57 +0000 | [diff] [blame] | 200 | - @ref CLTransposeKernel |
Giorgio Arena | 5b50f42 | 2021-02-17 11:43:05 +0000 | [diff] [blame] | 201 | - Deprecate functions in CLTuner: |
| 202 | - add_lws_to_table |
| 203 | - import_lws_table |
| 204 | - lws_table |
Sheri Zhang | da6a6eb | 2021-01-06 11:15:06 +0000 | [diff] [blame] | 205 | - Remove functions: |
Georgios Pinitas | 96b16b6 | 2020-12-01 17:41:34 +0000 | [diff] [blame] | 206 | - NELocallyConnectedLayer / CLLocallyConnectedLayer |
Georgios Pinitas | f7c5a41 | 2020-12-03 14:38:33 +0000 | [diff] [blame] | 207 | - NEIm2Col |
| 208 | - NECol2Im |
| 209 | - NEGEMMInterleave4x4 |
| 210 | - NEGEMMTranspose1xW |
Georgios Pinitas | 8c3c0e7 | 2020-12-03 20:11:53 +0000 | [diff] [blame] | 211 | - NEComputeAllAnchors / CLComputeAllAnchors |
Georgios Pinitas | ec2256b | 2020-12-03 18:51:58 +0000 | [diff] [blame] | 212 | - NEGEMMAssemblyDispatch |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 213 | - NEUpsampleLayer / CLUpsampleLayer |
Sheri Zhang | da6a6eb | 2021-01-06 11:15:06 +0000 | [diff] [blame] | 214 | - Remove kernels: |
Georgios Pinitas | d308df3 | 2020-12-01 16:56:36 +0000 | [diff] [blame] | 215 | - NEGEMMMatrixVectorMultiplyKernel |
Georgios Pinitas | 96b16b6 | 2020-12-01 17:41:34 +0000 | [diff] [blame] | 216 | - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 217 | - NEUpsampleLayerKernel / CLUpsampleLayerKernel |
Gian Marco Iodice | f5aad51 | 2021-02-08 17:34:40 +0000 | [diff] [blame] | 218 | - Extend OpenCL tuner with workgroup batch size support |
| 219 | - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units |
Gian Marco Iodice | 716b1be | 2021-02-10 17:33:27 +0000 | [diff] [blame] | 220 | - Add functionality to load the OpenCL GEMM heuristics at runtime |
| 221 | - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL |
Giorgio Arena | cd7d178 | 2021-02-22 14:58:37 +0000 | [diff] [blame] | 222 | - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation |
Giorgio Arena | 1ffa5ac | 2021-02-23 12:31:54 +0000 | [diff] [blame] | 223 | - Note: data-type decoupling is in progress and expiremental. Warning of unused symbols might be raised |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 224 | |
SiCong Li | 96209c7 | 2020-08-21 12:28:30 +0100 | [diff] [blame] | 225 | v20.11 Public major release |
morgolock | 70b1eb8 | 2020-11-24 13:54:19 +0000 | [diff] [blame] | 226 | - Various bug fixes. |
| 227 | - Various optimisations. |
| 228 | - Performance regressions can be noted when executing Depthwise Convolution on Neon with a depth multiplier > 1 for quantized data type. |
morgolock | 0e72849 | 2020-11-20 11:03:33 +0000 | [diff] [blame] | 229 | This is planned to be resolved in 21.02 release. |
morgolock | 70b1eb8 | 2020-11-24 13:54:19 +0000 | [diff] [blame] | 230 | - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer. |
SiCong Li | 903f8cc | 2020-08-27 10:17:10 +0100 | [diff] [blame] | 231 | - Added new data type S32 support for: |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 232 | - NEArithmeticSubtraction |
| 233 | - NEArithmeticSubtractionKernel |
SiCong Li | bb88f89 | 2020-08-28 11:18:47 +0100 | [diff] [blame] | 234 | - @ref NEPixelWiseMultiplication |
| 235 | - @ref NEPixelWiseMultiplicationKernel |
Sang-Hoon Park | 63001ac | 2021-01-18 14:20:27 +0000 | [diff] [blame] | 236 | - NEElementwiseDivision |
| 237 | - NEDivisionOperationKernel |
SiCong Li | 96209c7 | 2020-08-21 12:28:30 +0100 | [diff] [blame] | 238 | - Interface change |
| 239 | - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension |
| 240 | on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. |
| 241 | The supported value range of axis is [-rank, rank). |
| 242 | This change applies to the following functions: |
| 243 | - @ref NESoftmaxLayer |
| 244 | - @ref NELogSoftmaxLayer |
| 245 | - @ref CLSoftmaxLayer |
| 246 | - @ref CLLogSoftmaxLayer |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 247 | - GCSoftmaxLayer |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 248 | - New OpenCL kernels / functions: |
| 249 | - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel |
morgolock | 0e72849 | 2020-11-20 11:03:33 +0000 | [diff] [blame] | 250 | - @ref CLLogicalNot |
| 251 | - @ref CLLogicalAnd |
| 252 | - @ref CLLogicalOr |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 253 | - New Neon kernels / functions: |
morgolock | 0e72849 | 2020-11-20 11:03:33 +0000 | [diff] [blame] | 254 | - @ref NELogicalNot |
| 255 | - @ref NELogicalAnd |
| 256 | - @ref NELogicalOr |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 257 | - Removed padding from Neon kernels: |
Sheri Zhang | ed36713 | 2020-10-08 15:46:16 +0100 | [diff] [blame] | 258 | - @ref NEComplexPixelWiseMultiplicationKernel |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 259 | - NENonMaximaSuppression3x3Kernel |
| 260 | - @ref NERemapKernel |
Sheri Zhang | ed36713 | 2020-10-08 15:46:16 +0100 | [diff] [blame] | 261 | - @ref NEGEMMInterleave4x4Kernel |
| 262 | - @ref NEDirectConvolutionLayerKernel |
Manuel Bottini | 10b3826 | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 263 | - NEScaleKernel |
Georgios Pinitas | 96b16b6 | 2020-12-01 17:41:34 +0000 | [diff] [blame] | 264 | - NELocallyConnectedMatrixMultiplyKernel |
Sheri Zhang | ed36713 | 2020-10-08 15:46:16 +0100 | [diff] [blame] | 265 | - @ref NEGEMMLowpOffsetContributionKernel |
| 266 | - @ref NEGEMMTranspose1xWKernel |
Michele Di Giorgio | 1928904 | 2021-02-03 16:05:00 +0000 | [diff] [blame] | 267 | - NEPoolingLayerKernel |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 268 | - NEConvolutionKernel |
Sheri Zhang | ed36713 | 2020-10-08 15:46:16 +0100 | [diff] [blame] | 269 | - @ref NEDepthwiseConvolutionLayerNativeKernel |
| 270 | - @ref NEGEMMLowpMatrixMultiplyKernel |
| 271 | - @ref NEGEMMMatrixMultiplyKernel |
| 272 | - @ref NEDirectConvolutionLayerOutputStageKernel |
| 273 | - @ref NEReductionOperationKernel |
| 274 | - @ref NEGEMMLowpMatrixAReductionKernel |
| 275 | - @ref NEGEMMLowpMatrixBReductionKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 276 | - Removed padding from OpenCL kernels: |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 277 | - CLBatchConcatenateLayerKernel |
Michele Di Giorgio | 1e0208a | 2021-01-22 15:42:59 +0000 | [diff] [blame] | 278 | - CLElementwiseOperationKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 279 | - @ref CLBatchNormalizationLayerKernel |
Michele Di Giorgio | e131466 | 2021-02-01 17:09:32 +0000 | [diff] [blame] | 280 | - CLPoolingLayerKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 281 | - @ref CLWinogradInputTransformKernel |
| 282 | - @ref CLGEMMLowpMatrixMultiplyNativeKernel |
| 283 | - @ref CLGEMMLowpMatrixAReductionKernel |
| 284 | - @ref CLGEMMLowpMatrixBReductionKernel |
| 285 | - @ref CLGEMMLowpOffsetContributionOutputStageKernel |
| 286 | - @ref CLGEMMLowpOffsetContributionKernel |
| 287 | - @ref CLWinogradOutputTransformKernel |
| 288 | - @ref CLGEMMLowpMatrixMultiplyReshapedKernel |
| 289 | - @ref CLFuseBatchNormalizationKernel |
| 290 | - @ref CLDepthwiseConvolutionLayerNativeKernel |
| 291 | - @ref CLDepthConvertLayerKernel |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 292 | - CLCopyKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 293 | - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel |
Georgios Pinitas | f47f718 | 2021-01-15 09:29:50 +0000 | [diff] [blame] | 294 | - CLActivationLayerKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 295 | - @ref CLWinogradFilterTransformKernel |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 296 | - CLWidthConcatenateLayerKernel |
| 297 | - CLWidthConcatenate4TensorsKernel |
| 298 | - CLWidthConcatenate2TensorsKernel |
Sang-Hoon Park | 201e0fe | 2021-01-27 13:14:56 +0000 | [diff] [blame] | 299 | - CLLogits1DMaxShiftExpSumKernel |
| 300 | - CLLogits1DNormKernel |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 301 | - CLHeightConcatenateLayerKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 302 | - @ref CLGEMMMatrixMultiplyKernel |
| 303 | - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel |
| 304 | - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel |
| 305 | - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 306 | - CLDepthConcatenateLayerKernel |
Sheri Zhang | 824061d | 2020-10-26 15:46:37 +0000 | [diff] [blame] | 307 | - @ref CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel |
| 308 | - Removed OpenCL kernels / functions: |
| 309 | - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
| 310 | - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel |
| 311 | - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel |
morgolock | 00c7601 | 2020-11-06 10:40:12 +0000 | [diff] [blame] | 312 | - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
Georgios Pinitas | 2d22139 | 2020-09-03 15:16:37 +0100 | [diff] [blame] | 313 | - CLLocallyConnectedLayer |
| 314 | - CLLocallyConnectedMatrixMultiplyKernel |
morgolock | 00c7601 | 2020-11-06 10:40:12 +0000 | [diff] [blame] | 315 | - CLAbsoluteDifference |
| 316 | - CLAbsoluteDifferenceKernel |
| 317 | - CLAccumulate |
| 318 | - CLAccumulateKernel |
| 319 | - CLAccumulateSquared |
| 320 | - CLAccumulateSquaredKernel |
| 321 | - CLAccumulateWeighted |
| 322 | - CLAccumulateWeightedKernel |
| 323 | - CLAccumulateWeightedFP16Kernel |
| 324 | - CLBox3x3 |
| 325 | - CLBox3x3Kernel |
| 326 | - CLBox3x3FP16Kernel |
| 327 | - CLCannyEdge |
| 328 | - CLChannelCombine |
| 329 | - CLChannelCombineKernel |
| 330 | - CLChannelExtract |
| 331 | - CLChannelExtractKernel |
| 332 | - CLColorConvert |
| 333 | - CLColorConvertKernel |
| 334 | - CLConvolution3x3 |
| 335 | - CLConvolutionRectangle |
| 336 | - CLConvolutionRectangleKernel |
| 337 | - CLConvolutionSquare |
| 338 | - CLConvolutionKernel |
| 339 | - CLDerivative |
| 340 | - CLDerivativeKernel |
| 341 | - CLDilate |
| 342 | - CLDilateKernel |
| 343 | - CLEqualizeHistogram |
| 344 | - CLErode |
| 345 | - CLErodeKernel |
| 346 | - CLFastCorners |
| 347 | - CLFastCornersKernel |
| 348 | - CLGaussian3x3 |
| 349 | - CLGaussian3x3Kernel |
| 350 | - CLGaussian5x5 |
| 351 | - CLGaussian5x5HorKernel |
| 352 | - CLGaussian5x5VertKernel |
| 353 | - CLGaussianPyramid |
| 354 | - CLGaussianPyramidHalf |
| 355 | - CLGaussianPyramidOrb |
| 356 | - CLHarrisCorners |
| 357 | - CLHarrisScoreKernel |
| 358 | - CLHarrisScoreFP16Kernel |
| 359 | - CLHistogram |
| 360 | - CLHistogramKernel |
| 361 | - CLHOGOrientationBinningKernel |
| 362 | - CLHOGBlockNormalizationKernel |
| 363 | - CLHOGDetectorKernel |
| 364 | - CLHOGNonMaximaSuppressionKernel |
| 365 | - CLHOGDescriptor |
| 366 | - CLHOGDetector |
| 367 | - CLHOGGradient |
| 368 | - CLHOGMultiDetection |
| 369 | - CLHOGOrientationBinningKernel |
| 370 | - CLHOGBlockNormalizationKernel |
| 371 | - CLHOGDetectorKernel |
| 372 | - CLIntegralImage |
| 373 | - CLIntegralImageKernel |
| 374 | - CLLaplacianReconstruct |
| 375 | - CLLaplacianPyramid |
| 376 | - CLMagnitude |
| 377 | - CLMagnitudePhaseKernel |
| 378 | - CLMedian3x3 |
| 379 | - CLMedian3x3Kernel |
| 380 | - CLMinMaxLocation |
| 381 | - CLMinMaxLocationKernel |
| 382 | - CLNonLinearFilter |
| 383 | - CLNonLinearFilterKernel |
| 384 | - CLNonMaximaSuppression3x3 |
| 385 | - CLNonMaximaSuppression3x3FP16Kernel |
| 386 | - CLNonMaximaSuppression3x3Kernel |
| 387 | - CLOpticalFlow |
| 388 | - CLPhase |
| 389 | - CLRemap |
| 390 | - CLRemapKernel |
| 391 | - CLScharr3x3 |
| 392 | - CLScharr3x3Kernel |
| 393 | - CLSobel3x3 |
| 394 | - CLSobel3x3Kernel |
| 395 | - CLSobel5x5 |
| 396 | - CLSobel5x5HorKernel |
| 397 | - CLSobel5x5VertKernel |
| 398 | - CLSobel7x7 |
| 399 | - CLSobel7x7HorKernel |
| 400 | - CLSobel7x7VertKernel |
| 401 | - CLThreshold |
| 402 | - CLThresholdKernel |
| 403 | - CLWarpAffine |
| 404 | - CLWarpAffineKernel |
| 405 | - CLWarpPerspective |
| 406 | - CLWarpPerspectiveKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 407 | - Deprecated Neon kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
Georgios Pinitas | 2d22139 | 2020-09-03 15:16:37 +0100 | [diff] [blame] | 408 | - NELocallyConnectedLayer |
| 409 | - NELocallyConnectedMatrixMultiplyKernel |
morgolock | 0c86265 | 2020-11-06 08:59:45 +0000 | [diff] [blame] | 410 | - NEAbsoluteDifference |
| 411 | - NEAbsoluteDifferenceKernel |
| 412 | - NEAccumulate |
| 413 | - NEAccumulateKernel |
| 414 | - NEAccumulateSquared |
| 415 | - NEAccumulateSquaredKernel |
| 416 | - NEAccumulateWeighted |
| 417 | - NEAccumulateWeightedKernel |
| 418 | - NEAccumulateWeightedFP16Kernel |
| 419 | - NEBox3x3 |
| 420 | - NEBox3x3Kernel |
| 421 | - NEBox3x3FP16Kernel |
| 422 | - NECannyEdge |
| 423 | - NEChannelCombine |
| 424 | - NEChannelCombineKernel |
| 425 | - NEChannelExtract |
| 426 | - NEChannelExtractKernel |
| 427 | - NEColorConvert |
| 428 | - NEColorConvertKernel |
| 429 | - NEConvolution3x3 |
| 430 | - NEConvolutionRectangle |
| 431 | - NEConvolutionRectangleKernel |
| 432 | - NEConvolutionSquare |
| 433 | - NEConvolutionKernel |
| 434 | - NEDerivative |
| 435 | - NEDerivativeKernel |
| 436 | - NEDilate |
| 437 | - NEDilateKernel |
| 438 | - NEEqualizeHistogram |
| 439 | - NEErode |
| 440 | - NEErodeKernel |
| 441 | - NEFastCorners |
| 442 | - NEFastCornersKernel |
| 443 | - NEGaussian3x3 |
| 444 | - NEGaussian3x3Kernel |
| 445 | - NEGaussian5x5 |
| 446 | - NEGaussian5x5HorKernel |
| 447 | - NEGaussian5x5VertKernel |
| 448 | - NEGaussianPyramid |
| 449 | - NEGaussianPyramidHalf |
| 450 | - NEGaussianPyramidOrb |
| 451 | - NEHarrisCorners |
| 452 | - NEHarrisScoreKernel |
| 453 | - NEHarrisScoreFP16Kernel |
| 454 | - NEHistogram |
| 455 | - NEHistogramKernel |
| 456 | - NEHOGOrientationBinningKernel |
| 457 | - NEHOGBlockNormalizationKernel |
| 458 | - NEHOGDetectorKernel |
| 459 | - NEHOGNonMaximaSuppressionKernel |
| 460 | - NEHOGDescriptor |
| 461 | - NEHOGDetector |
| 462 | - NEHOGGradient |
| 463 | - NEHOGMultiDetection |
| 464 | - NEHOGOrientationBinningKernel |
| 465 | - NEHOGBlockNormalizationKernel |
| 466 | - NEHOGDetectorKernel |
| 467 | - NEIntegralImage |
| 468 | - NEIntegralImageKernel |
| 469 | - NELaplacianReconstruct |
| 470 | - NELaplacianPyramid |
| 471 | - NEMagnitude |
| 472 | - NEMagnitudePhaseKernel |
| 473 | - NEMedian3x3 |
| 474 | - NEMedian3x3Kernel |
| 475 | - NEMinMaxLocation |
| 476 | - NEMinMaxLocationKernel |
| 477 | - NENonLinearFilter |
| 478 | - NENonLinearFilterKernel |
| 479 | - NENonMaximaSuppression3x3 |
| 480 | - NENonMaximaSuppression3x3FP16Kernel |
| 481 | - NENonMaximaSuppression3x3Kernel |
| 482 | - NEOpticalFlow |
| 483 | - NEPhase |
| 484 | - NERemap |
| 485 | - NERemapKernel |
| 486 | - NEScharr3x3 |
| 487 | - NEScharr3x3Kernel |
| 488 | - NESobel3x3 |
| 489 | - NESobel3x3Kernel |
| 490 | - NESobel5x5 |
| 491 | - NESobel5x5HorKernel |
| 492 | - NESobel5x5VertKernel |
| 493 | - NESobel7x7 |
| 494 | - NESobel7x7HorKernel |
| 495 | - NESobel7x7VertKernel |
| 496 | - NEThreshold |
| 497 | - NEThresholdKernel |
| 498 | - NEWarpAffine |
| 499 | - NEWarpAffineKernel |
| 500 | - NEWarpPerspective |
| 501 | - NEWarpPerspectiveKernel |
morgolock | d6ee9ed | 2020-11-19 10:07:14 +0000 | [diff] [blame] | 502 | - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
| 503 | - GCAbsoluteDifference |
| 504 | - GCActivationLayer |
| 505 | - GCArithmeticAddition |
| 506 | - GCBatchNormalizationLayer |
| 507 | - GCConcatenateLayer |
| 508 | - GCConvolutionLayer |
| 509 | - GCDepthwiseConvolutionLayer |
| 510 | - GCDirectConvolutionLayer |
| 511 | - GCDropoutLayer |
| 512 | - GCFillBorder |
| 513 | - GCFullyConnectedLayer |
| 514 | - GCGEMM |
| 515 | - GCGEMMInterleave4x4 |
| 516 | - GCGEMMTranspose1xW |
| 517 | - GCNormalizationLayer |
| 518 | - GCNormalizePlanarYUVLayer |
| 519 | - GCPixelWiseMultiplication |
| 520 | - GCPoolingLayer |
| 521 | - GCScale |
| 522 | - GCSoftmaxLayer |
| 523 | - GCTensorShift |
| 524 | - GCTranspose |
| 525 | |
SiCong Li | 96209c7 | 2020-08-21 12:28:30 +0100 | [diff] [blame] | 526 | |
Georgios Pinitas | 25ef721 | 2020-06-02 23:00:41 +0100 | [diff] [blame] | 527 | v20.08 Public major release |
| 528 | - Various bug fixes. |
| 529 | - Various optimisations. |
Sheri Zhang | 3ef9b5f | 2020-07-09 16:32:58 +0100 | [diff] [blame] | 530 | - Added new data type QASYMM8_SIGNED support for: |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 531 | - @ref CLArgMinMaxLayer |
| 532 | - @ref CLArgMinMaxLayerKernel |
| 533 | - Added new data type U8 support for: |
| 534 | - @ref NECropKernel |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 535 | - CLCropKernel |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 536 | - Added aligh_corner support for nearest neighbor interpolation in: |
Manuel Bottini | 10b3826 | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 537 | - NEScaleKernel |
Manuel Bottini | 3b131ab | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 538 | - CLScaleKernel |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 539 | - New OpenCL kernels / functions: |
| 540 | - @ref CLMaxUnpoolingLayerKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 541 | - New Neon kernels / functions: |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 542 | - @ref NEMaxUnpoolingLayerKernel |
Sheri Zhang | 3ef9b5f | 2020-07-09 16:32:58 +0100 | [diff] [blame] | 543 | - New graph example: |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 544 | - graph_yolov3_output_detector |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 545 | - GEMMTuner improvements: |
| 546 | - Added fp16 support |
| 547 | - Output json files for easier integration |
| 548 | - Enabled tuning for export_to_cl_image_rhs option for RHS tensors |
| 549 | - More robust script for running benchmarks |
Sheri Zhang | 3ef9b5f | 2020-07-09 16:32:58 +0100 | [diff] [blame] | 550 | - Removed padding from: |
Sheri Zhang | dd4cfc0 | 2020-07-10 14:15:41 +0100 | [diff] [blame] | 551 | - @ref NEPixelWiseMultiplicationKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 552 | - NEHeightConcatenateLayerKernel |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 553 | - NEThresholdKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 554 | - NEBatchConcatenateLayerKernel |
Teresa Charlin | d1dc09c | 2021-03-04 15:24:45 +0000 | [diff] [blame] | 555 | - NETransposeKernel |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 556 | - @ref NEBatchNormalizationLayerKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 557 | - NEArithmeticSubtractionKernel |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 558 | - @ref NEBoundingBoxTransformKernel |
Michalis Spyrou | 373b407 | 2021-01-20 16:41:12 +0000 | [diff] [blame] | 559 | - NELogits1DMaxKernel |
| 560 | - NELogits1DSoftmaxKernel |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 561 | - @ref NEROIPoolingLayerKernel |
| 562 | - @ref NEROIAlignLayerKernel |
Georgios Pinitas | 0b1c2db | 2020-12-04 15:51:34 +0000 | [diff] [blame] | 563 | - NEYOLOLayerKernel |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 564 | - NEUpsampleLayerKernel |
Georgios Pinitas | 70eb53b | 2021-01-06 19:42:21 +0000 | [diff] [blame] | 565 | - NEFloorKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 566 | - NEWidthConcatenateLayerKernel |
| 567 | - NEDepthConcatenateLayerKernel |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 568 | - @ref NENormalizationLayerKernel |
| 569 | - @ref NEL2NormalizeLayerKernel |
| 570 | - @ref NEFillArrayKernel |
| 571 | - @ref NEDepthConvertLayerKernel |
| 572 | - @ref NERangeKernel |
| 573 | - @ref NEPriorBoxLayer |
Sheri Zhang | ed36713 | 2020-10-08 15:46:16 +0100 | [diff] [blame] | 574 | - Removed OpenCL kernels / functions: |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 575 | - CLGEMMLowpQuantizeDownInt32ToUint8Scale |
| 576 | - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 577 | - Removed Neon kernels / functions: |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 578 | - NEGEMMLowpQuantizeDownInt32ToUint8Scale |
| 579 | - NEGEMMMatrixAccumulateBiasesKernel |
SiCong Li | d004a7a | 2020-05-28 15:26:41 +0100 | [diff] [blame] | 580 | - Deprecated functions / interfaces: |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 581 | - Non-descriptor based interfaces for NEThreshold, CLThreshold |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 582 | - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale |
| 583 | - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer : |
| 584 | The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0. |
morgolock | 9c7fed8 | 2020-08-05 12:30:56 +0100 | [diff] [blame] | 585 | Only axis 0 is supported. |
| 586 | The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0. |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 587 | Only axis 0 is supported. |
Sang-Hoon Park | a0205b9 | 2020-07-07 09:36:09 +0100 | [diff] [blame] | 588 | - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity. |
Gian Marco Iodice | 547b2e7 | 2020-08-12 10:25:29 +0100 | [diff] [blame] | 589 | - Removed padding requirement for the input (e.g. LHS of GEMM) and output in @ref CLGEMMMatrixMultiplyNativeKernel, @ref CLGEMMMatrixMultiplyReshapedKernel, @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and @ref CLIm2ColKernel (NHWC only) |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 590 | - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output. |
| 591 | - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation. |
| 592 | - Only on Arm Mali Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since @ref CLGEMMMatrixMultiplyKernel is called and currently requires padding. |
Gian Marco Iodice | 547b2e7 | 2020-08-12 10:25:29 +0100 | [diff] [blame] | 593 | - Added support for exporting the OpenCL buffer object to the OpenCL image object in @ref CLGEMMMatrixMultiplyReshapedKernel and @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel. |
Sang-Hoon Park | adfaefb | 2020-08-18 09:13:05 +0100 | [diff] [blame] | 594 | - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object. |
| 595 | - The padding requirement for the OpenCL image object is considered into the @ref CLGEMMReshapeRHSMatrixKernel. |
| 596 | - The reshaped RHS matrix stores the weights when GEMM is used to accelerate @ref CLGEMMConvolutionLayer. |
Georgios Pinitas | 25ef721 | 2020-06-02 23:00:41 +0100 | [diff] [blame] | 597 | |
Georgios Pinitas | fd7780d | 2020-03-17 11:41:00 +0000 | [diff] [blame] | 598 | v20.05 Public major release |
Georgios Pinitas | c7b183a | 2020-03-06 18:12:09 +0000 | [diff] [blame] | 599 | - Various bug fixes. |
| 600 | - Various optimisations. |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 601 | - Updated recommended NDK version to r18b. |
| 602 | - Updated recommended gcc version to Linaro 6.3.1. |
Georgios Pinitas | c7b183a | 2020-03-06 18:12:09 +0000 | [diff] [blame] | 603 | - Added Bfloat16 type support |
| 604 | - Added Bfloat16 support in: |
| 605 | - @ref NEWeightsReshapeKernel |
| 606 | - @ref NEConvolutionLayerReshapeWeights |
| 607 | - @ref NEIm2ColKernel |
Georgios Pinitas | f7c5a41 | 2020-12-03 14:38:33 +0000 | [diff] [blame] | 608 | - NEIm2Col |
Georgios Pinitas | c7b183a | 2020-03-06 18:12:09 +0000 | [diff] [blame] | 609 | - @ref NEDepthConvertLayerKernel |
| 610 | - @ref NEDepthConvertLayer |
| 611 | - @ref NEGEMMConvolutionLayer |
Georgios Pinitas | ec2256b | 2020-12-03 18:51:58 +0000 | [diff] [blame] | 612 | - NEGEMMAssemblyDispatch |
Sheri Zhang | 0f2522b | 2020-03-25 16:38:19 +0000 | [diff] [blame] | 613 | - Added new data type QASYMM8_SIGNED support for: |
| 614 | - @ref CLDirectConvolutionLayer |
| 615 | - @ref CLDeconvolutionLayer |
| 616 | - @ref CLDirectDeconvolutionLayer |
| 617 | - @ref CLGEMMDeconvolutionLayer |
| 618 | - @ref CLGEMMLowpMatrixMultiplyReshapedKernel |
| 619 | - @ref CLGEMMLowpQuantizeDownInt32ScaleKernel |
| 620 | - @ref CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel |
| 621 | - @ref CLReductionOperation |
| 622 | - @ref CLReduceMean |
Sheri Zhang | 359c48e | 2020-04-30 22:53:39 +0100 | [diff] [blame] | 623 | - @ref NEScale |
Manuel Bottini | 10b3826 | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 624 | - NEScaleKernel |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 625 | - NEUpsampleLayer |
Sheri Zhang | 0f2522b | 2020-03-25 16:38:19 +0000 | [diff] [blame] | 626 | - @ref NECast |
| 627 | - @ref NEReductionOperation |
| 628 | - @ref NEReduceMean |
| 629 | - @ref NEArgMinMaxLayer |
| 630 | - @ref NEDeconvolutionLayer |
| 631 | - @ref NEGEMMLowpQuantizeDownInt32ScaleKernel |
| 632 | - @ref CPPBoxWithNonMaximaSuppressionLimit |
| 633 | - @ref CPPDetectionPostProcessLayer |
| 634 | - @ref CPPPermuteKernel |
| 635 | - @ref CPPPermute |
| 636 | - @ref CPPTopKVKernel |
| 637 | - @ref CPPTopKV |
Sheri Zhang | 359c48e | 2020-04-30 22:53:39 +0100 | [diff] [blame] | 638 | - @ref CPPUpsample |
| 639 | - @ref CPPUpsampleKernel |
Sheri Zhang | 31b49ca | 2020-04-24 11:15:10 +0100 | [diff] [blame] | 640 | - New OpenCL kernels / functions: |
| 641 | - @ref CLQLSTMLayer |
| 642 | - @ref CLQLSTMLayerNormalizationKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 643 | - New Neon kernels / functions: |
Sheri Zhang | 31b49ca | 2020-04-24 11:15:10 +0100 | [diff] [blame] | 644 | - @ref NEQLSTMLayer |
| 645 | - @ref NEQLSTMLayerNormalizationKernel |
| 646 | - Added HARD_SWISH support in: |
Georgios Pinitas | f47f718 | 2021-01-15 09:29:50 +0000 | [diff] [blame] | 647 | - CLActivationLayerKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 648 | - NEActivationLayerKernel |
Sheri Zhang | 0f2522b | 2020-03-25 16:38:19 +0000 | [diff] [blame] | 649 | - Deprecated OpenCL kernels / functions: |
| 650 | - CLGEMMLowpQuantizeDownInt32ToUint8Scale |
| 651 | - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 652 | - Deprecated Neon kernels / functions: |
Sheri Zhang | 0f2522b | 2020-03-25 16:38:19 +0000 | [diff] [blame] | 653 | - NEGEMMLowpQuantizeDownInt32ToUint8Scale |
| 654 | - Removed CPP kernels / functions: |
| 655 | - CPPFlipWeightsKernel |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 656 | - Removed PoolingLayerInfo constructors without Data Layout. |
| 657 | - Removed CLDepthwiseConvolutionLayer3x3 |
| 658 | - Removed NEDepthwiseConvolutionLayerOptimized |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 659 | - Added support for Winograd 3x3,4x4 on Neon FP16: |
Manuel Bottini | 075253a | 2020-05-22 12:57:18 +0100 | [diff] [blame] | 660 | - @ref NEWinogradConvolutionLayer |
| 661 | - @ref NEWinogradLayerTransformInputKernel |
| 662 | - @ref NEWinogradLayerTransformOutputKernel |
| 663 | - @ref NEWinogradLayerTransformWeightsKernel |
| 664 | - Added CLCompileContext |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 665 | - Added Neon GEMM kernel with 2D window support |
Georgios Pinitas | c7b183a | 2020-03-06 18:12:09 +0000 | [diff] [blame] | 666 | |
Michele Di Giorgio | 740872e | 2020-03-04 15:29:49 +0000 | [diff] [blame] | 667 | v20.02.1 Maintenance release |
| 668 | - Added Android-NN build script. |
| 669 | |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 670 | v20.02 Public major release |
| 671 | - Various bug fixes. |
| 672 | - Various optimisations. |
| 673 | - Added new data type QASYMM8_SIGNED support for: |
| 674 | - @ref CLDepthwiseConvolutionLayer |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 675 | - CLDepthwiseConvolutionLayer3x3 |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 676 | - @ref CLGEMMConvolutionLayer |
| 677 | - @ref CLGEMMLowpMatrixMultiplyCore |
| 678 | - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
| 679 | - @ref CLGEMMLowpMatrixMultiplyNativeKernel |
| 680 | - @ref NEActivationLayer |
Sang-Hoon Park | 63001ac | 2021-01-18 14:20:27 +0000 | [diff] [blame] | 681 | - NEComparisonOperationKernel |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 682 | - @ref NEConvolutionLayer |
| 683 | - @ref NEDepthwiseConvolutionLayer |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 684 | - NEDepthwiseConvolutionLayer3x3Kernel |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 685 | - @ref NEDirectConvolutionLayerOutputStageKernel |
| 686 | - @ref NEElementwiseComparison |
| 687 | - @ref NEElementwiseMax |
| 688 | - @ref NEElementwiseMin |
| 689 | - @ref NEElementwiseSquaredDiff |
| 690 | - @ref NEFullyConnectedLayer |
Michele Di Giorgio | f22f672 | 2020-07-03 16:29:24 +0100 | [diff] [blame] | 691 | - NEGEMMMatrixVectorMultiplyKernel |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 692 | - @ref NEPixelWiseMultiplication |
| 693 | - @ref NEPoolingLayer |
| 694 | - @ref NEPReluLayer |
| 695 | - Added support for QSYMM8_PER_CHANNEL in: |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 696 | - NEDepthwiseConvolutionLayer3x3Kernel |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 697 | - Added support for split sizes in: |
| 698 | - @ref CLSplit |
| 699 | - @ref NESplit |
| 700 | - New OpenCL kernels / functions: |
| 701 | - @ref CLFill |
Michele Di Giorgio | ba14c92 | 2020-10-12 13:27:57 +0100 | [diff] [blame] | 702 | - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 703 | - New Neon kernels / functions: |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 704 | - @ref NEFill |
| 705 | - @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 706 | - Deprecated Neon functions / interfaces: |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 707 | - CLDepthwiseConvolutionLayer3x3 |
| 708 | - NEDepthwiseConvolutionLayerOptimized |
| 709 | - PoolingLayerInfo constructors without Data Layout. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 710 | - Added support for quantization with multiplier greater than 1 on Neon and CL. |
Giuseppe Rossini | f04ddbc | 2020-02-17 17:22:49 +0000 | [diff] [blame] | 711 | - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer. |
| 712 | - Added the ability to build bootcode for bare metal. |
| 713 | - Added support for generating synthetic QASYMM8 graphs. |
| 714 | - Added support for F16 datatype in VGG16. |
| 715 | - Removed pre-built binaries for GLES. |
| 716 | |
Michele Di Giorgio | d374ff2 | 2020-01-21 10:03:20 +0000 | [diff] [blame] | 717 | v19.11.1 Public maintenance release |
| 718 | - Fix offset calculation in NEReductionOperationKernel. |
| 719 | - Fix data layout in NEScaleKernel for nhwc. |
| 720 | - Retain configuration step data layout to avoid side-effects. |
| 721 | - Perform sqrt in double domain for L2 pooling. |
| 722 | - Fix output shape calculation for Reduce Mean |
| 723 | - Restrict cases where optimized NEPadLayer runs. |
| 724 | |
Michele Di Giorgio | a046e16 | 2019-10-08 09:36:26 +0100 | [diff] [blame] | 725 | v19.11 Public major release |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 726 | - Various bug fixes. |
| 727 | - Various optimisations. |
SiCong Li | 1f7f988 | 2019-11-28 14:59:35 +0000 | [diff] [blame] | 728 | - Updated recommended NDK version to r17c. |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 729 | - Deprecated OpenCL kernels / functions: |
Michele Di Giorgio | a046e16 | 2019-10-08 09:36:26 +0100 | [diff] [blame] | 730 | - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel |
| 731 | - CLDepthwiseIm2ColKernel |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 732 | - CLDepthwiseSeparableConvolutionLayer |
Michele Di Giorgio | a046e16 | 2019-10-08 09:36:26 +0100 | [diff] [blame] | 733 | - CLDepthwiseVectorToTensorKernel |
| 734 | - CLDirectConvolutionLayerOutputStageKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 735 | - Deprecated Neon kernels / functions: |
Giorgio Arena | d93e263 | 2019-10-15 11:09:33 +0100 | [diff] [blame] | 736 | - NEDepthwiseWeightsReshapeKernel |
| 737 | - NEDepthwiseIm2ColKernel |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 738 | - NEDepthwiseSeparableConvolutionLayer |
Giorgio Arena | d93e263 | 2019-10-15 11:09:33 +0100 | [diff] [blame] | 739 | - NEDepthwiseVectorToTensorKernel |
Manuel Bottini | 05069f0 | 2019-09-26 17:18:26 +0100 | [diff] [blame] | 740 | - NEDepthwiseConvolutionLayer3x3 |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 741 | - New OpenCL kernels / functions: |
| 742 | - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer |
| 743 | - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated |
| 744 | OpenCL kernels / functions) |
| 745 | - @ref CLLogSoftmaxLayer |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 746 | - New Neon kernels / functions: |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 747 | - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform |
Georgios Pinitas | 8c3c0e7 | 2020-12-03 20:11:53 +0000 | [diff] [blame] | 748 | - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 749 | - @ref NEDetectionPostProcessLayer |
| 750 | - @ref NEGenerateProposalsLayer |
| 751 | - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer |
| 752 | - @ref NELogSoftmaxLayer |
| 753 | - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer |
| 754 | - Added QASYMM8 support for: |
| 755 | - @ref CLGenerateProposalsLayer |
| 756 | - @ref CLROIAlignLayer |
| 757 | - @ref CPPBoxWithNonMaximaSuppressionLimit |
| 758 | - Added QASYMM16 support for: |
| 759 | - @ref CLBoundingBoxTransform |
| 760 | - Added FP16 support for: |
| 761 | - @ref CLGEMMMatrixMultiplyReshapedKernel |
| 762 | - Added new data type QASYMM8_PER_CHANNEL support for: |
Manuel Bottini | 9e73c93 | 2021-03-02 17:40:42 +0000 | [diff] [blame] | 763 | - CLDequantizationLayer |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 764 | - @ref NEDequantizationLayer |
| 765 | - Added new data type QSYMM8_PER_CHANNEL support for: |
| 766 | - @ref CLConvolutionLayer |
| 767 | - @ref NEConvolutionLayer |
| 768 | - @ref CLDepthwiseConvolutionLayer |
| 769 | - @ref NEDepthwiseConvolutionLayer |
| 770 | - Added FP16 mixed-precision support for: |
| 771 | - @ref CLGEMMMatrixMultiplyReshapedKernel |
Michele Di Giorgio | e131466 | 2021-02-01 17:09:32 +0000 | [diff] [blame] | 772 | - CLPoolingLayerKernel |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 773 | - Added FP32 and FP16 ELU activation for: |
| 774 | - @ref CLActivationLayer |
| 775 | - @ref NEActivationLayer |
| 776 | - Added asymmetric padding support for: |
| 777 | - @ref CLDirectDeconvolutionLayer |
| 778 | - @ref CLGEMMDeconvolutionLayer |
| 779 | - @ref NEDeconvolutionLayer |
| 780 | - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer. |
Georgios Pinitas | 0f7ef8a | 2021-01-10 04:23:52 +0000 | [diff] [blame] | 781 | - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer. |
| 782 | - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer. |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 783 | - Improved performance for CL Inception V3 - FP16. |
| 784 | - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision). |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 785 | - Improved Neon performance by enabling fusing batch normalization with convolution and depth-wise convolution layer. |
| 786 | - Improved Neon performance for MobileNet-SSD by improving the output detection performance. |
SiCong Li | ca1f98c | 2019-11-28 11:06:11 +0000 | [diff] [blame] | 787 | - Optimized @ref CLPadLayer. |
| 788 | - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel. |
| 789 | - Reduced memory consumption by implementing weights sharing. |
Michele Di Giorgio | a046e16 | 2019-10-08 09:36:26 +0100 | [diff] [blame] | 790 | |
Michele Di Giorgio | d374ff2 | 2020-01-21 10:03:20 +0000 | [diff] [blame] | 791 | v19.08.1 Public maintenance release |
| 792 | - Fix offset calculation in NEReductionOperationKernel. |
| 793 | - Fix data layout in NEScaleKernel for nhwc. |
| 794 | - Retain configuration step data layout to avoid side-effects. |
| 795 | - Perform sqrt in double domain for L2 pooling. |
| 796 | - Fix output shape calculation for Reduce Mean |
| 797 | - Fix broadcast CLPixelwiseMultiplication with 5D tensors |
| 798 | |
Georgios Pinitas | 3d13af8 | 2019-06-04 13:04:16 +0100 | [diff] [blame] | 799 | v19.08 Public major release |
| 800 | - Various bug fixes. |
| 801 | - Various optimisations. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 802 | - Deprecated Neon functions |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 803 | - NEDepthConcatenateLayer |
| 804 | - NEWidthConcatenateLayer |
| 805 | - Deprecated OpenCL kernels / functions |
| 806 | - CLDepthConcatenateLayer |
| 807 | - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4 |
| 808 | - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW |
| 809 | - CLWidthConcatenateLayer |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 810 | - New Neon kernels / functions: |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 811 | - @ref NEAbsLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 812 | - @ref NECast |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 813 | - @ref NEElementwisePower |
| 814 | - @ref NELogLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 815 | - @ref NELSTMLayerQuantized |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 816 | - @ref NENegLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 817 | - @ref NEPReluLayer |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 818 | - @ref NESinLayer |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 819 | - NEBatchConcatenateLayerKernel |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 820 | - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer |
| 821 | - @ref NEDepthwiseConvolutionLayerNativeKernel |
| 822 | - @ref NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
| 823 | - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer |
| 824 | - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer |
| 825 | - New OpenCL kernels / functions: |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 826 | - @ref CLAbsLayer |
| 827 | - @ref CLElementwisePower |
| 828 | - @ref CLLogLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 829 | - @ref CLLSTMLayerQuantized |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 830 | - @ref CLNegLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 831 | - @ref CLPReluLayer |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 832 | - @ref CLSinLayer |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 833 | - CLBatchConcatenateLayerKernel |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 834 | - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer |
| 835 | - @ref CLGEMMLowpMatrixMultiplyNativeKernel |
Michele Di Giorgio | ba14c92 | 2020-10-12 13:27:57 +0100 | [diff] [blame] | 836 | - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 837 | - @ref CLGEMMMatrixMultiplyNativeKernel |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 838 | - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 839 | - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer |
| 840 | - New examples: |
| 841 | - neon_opticalflow |
| 842 | - cl_cache |
| 843 | - neon_permute |
Gian Marco Iodice | c5f48ad | 2019-09-02 09:52:12 +0100 | [diff] [blame] | 844 | - Added support for FP16 in @ref NEDeconvolutionLayer |
| 845 | - Added support for FP16 in @ref CLDeconvolutionLayer |
| 846 | - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 847 | - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only) |
| 848 | - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only) |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 849 | - Re-factored the depthwise convolution layer kernel on Neon for generic cases |
| 850 | - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon only) |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 851 | - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file |
| 852 | - Altered @ref QuantizationInfo interface to support per-channel quantization. |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 853 | - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations. |
| 854 | - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations. |
Gian Marco Iodice | cc2f54b | 2019-08-22 10:10:52 +0100 | [diff] [blame] | 855 | - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface |
| 856 | - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 857 | - Optimized the Neon assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel |
Georgios Pinitas | 3d13af8 | 2019-06-04 13:04:16 +0100 | [diff] [blame] | 858 | |
Michalis Spyrou | a9c4472 | 2019-04-05 17:18:36 +0100 | [diff] [blame] | 859 | v19.05 Public major release |
Michalis Spyrou | c6608ac | 2019-05-16 17:40:23 +0100 | [diff] [blame] | 860 | - Various bug fixes. |
| 861 | - Various optimisations. |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 862 | - New Neon kernels / functions: |
| 863 | - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer |
Michalis Spyrou | ca82e62 | 2019-05-10 16:43:20 +0100 | [diff] [blame] | 864 | - @ref NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 865 | - @ref NECropKernel / @ref NECropResize |
Michalis Spyrou | ca82e62 | 2019-05-10 16:43:20 +0100 | [diff] [blame] | 866 | - @ref NEDepthwiseConvolutionAssemblyDispatch |
| 867 | - @ref NEFFTDigitReverseKernel |
| 868 | - @ref NEFFTRadixStageKernel |
| 869 | - @ref NEFFTScaleKernel |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 870 | - @ref NEGEMMLowpOffsetContributionOutputStageKernel |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 871 | - NEHeightConcatenateLayerKernel |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 872 | - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer |
Michalis Spyrou | d7dd15c | 2019-05-30 14:53:58 +0100 | [diff] [blame] | 873 | - @ref NEFFT1D |
| 874 | - @ref NEFFT2D |
| 875 | - @ref NEFFTConvolutionLayer |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 876 | - New OpenCL kernels / functions: |
Sheri Zhang | f9ab9f9 | 2021-03-16 12:09:15 +0000 | [diff] [blame] | 877 | - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 878 | - CLCropKernel / @ref CLCropResize |
Michalis Spyrou | d7dd15c | 2019-05-30 14:53:58 +0100 | [diff] [blame] | 879 | - @ref CLDeconvolutionReshapeOutputKernel |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 880 | - @ref CLFFTDigitReverseKernel |
| 881 | - @ref CLFFTRadixStageKernel |
| 882 | - @ref CLFFTScaleKernel |
| 883 | - @ref CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
| 884 | - @ref CLGEMMMatrixMultiplyReshapedOnlyRHSKernel |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 885 | - CLHeightConcatenateLayerKernel |
Georgios Pinitas | f790fdb | 2019-04-24 12:41:25 +0100 | [diff] [blame] | 886 | - @ref CLDirectDeconvolutionLayer |
| 887 | - @ref CLFFT1D |
| 888 | - @ref CLFFT2D |
| 889 | - @ref CLFFTConvolutionLayer |
Michalis Spyrou | ca82e62 | 2019-05-10 16:43:20 +0100 | [diff] [blame] | 890 | - @ref CLGEMMDeconvolutionLayer |
| 891 | - New OpenGLES kernels / functions: |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 892 | - GCConcatenateLayer |
Michalis Spyrou | a9c4472 | 2019-04-05 17:18:36 +0100 | [diff] [blame] | 893 | - Deprecated functions/interfaces |
Georgios Pinitas | 09f2497 | 2019-05-17 18:14:40 +0100 | [diff] [blame] | 894 | - GCDepthConcatenateLayer |
| 895 | - NEWidthConcatenateLayer |
| 896 | - NEDepthConcatenateLayer |
| 897 | - CLWidthConcatenateLayer |
| 898 | - CLDepthConcatenateLayer |
Gian Marco Iodice | 5fc07aa | 2019-05-15 17:08:02 +0100 | [diff] [blame] | 899 | - CLGEMMInterleave4x4 |
| 900 | - CLGEMMTranspose1xW |
Michalis Spyrou | c6608ac | 2019-05-16 17:40:23 +0100 | [diff] [blame] | 901 | - Support different quantization info in CLConcatLayer. |
| 902 | - Add checks on different input/output quantization info were not supported. |
| 903 | - Tensors have different quantization information. |
| 904 | - Add FP16 support checks. |
| 905 | - Fix output quantization CLDeptwiseConv3x3 when activation is fused. |
| 906 | - New graph examples: |
| 907 | - graph_convolution |
| 908 | - graph_fully_connected |
| 909 | - graph_depthwise_convolution |
| 910 | - Deepspeech v0.4.1 |
| 911 | - Add support for QASYMM8 in NEArithmeticSubtractionKernel. |
| 912 | - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel. |
| 913 | - Add support for QASYMM8 NEDeconvolution. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 914 | - Add support for DequantizationLayer for Neon/CL. |
Michalis Spyrou | c6608ac | 2019-05-16 17:40:23 +0100 | [diff] [blame] | 915 | - Add support for dilation in CLDepthwiseConvolution. |
| 916 | - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore. |
| 917 | - Optimize CLDeconvolution. |
| 918 | - Add StackLayer to the graph API. |
| 919 | - Add support for "reflect" padding mode in NEPad. |
| 920 | - Winograd 7x7 NHWC on OpenCL. |
| 921 | - Rework CL ML layers to run exclusively on CL. |
| 922 | - Support different quantization info in PoolingLayer. |
| 923 | - Implement and test import memory interfaces. |
| 924 | - Added new tests and removed old ones. |
| 925 | - Various clang-tidy fixes. |
Michalis Spyrou | a9c4472 | 2019-04-05 17:18:36 +0100 | [diff] [blame] | 926 | |
giuros01 | a69a88b | 2019-01-31 16:29:19 +0000 | [diff] [blame] | 927 | v19.02 Public major release |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 928 | - Various bug fixes. |
| 929 | - Various optimisations. |
| 930 | - New Neon kernels / functions: |
| 931 | - @ref NETileKernel / @ref NETile |
| 932 | - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization |
Sang-Hoon Park | 63001ac | 2021-01-18 14:20:27 +0000 | [diff] [blame] | 933 | - NEElementwiseOperationKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 934 | - @ref NEElementwiseMax |
| 935 | - @ref NEElementwiseMin |
| 936 | - @ref NEElementwiseSquaredDiff |
| 937 | - @ref NESelectKernel / @ref NESelect |
| 938 | - @ref NESplit |
| 939 | - @ref NESlice |
| 940 | - @ref NEUnstack |
| 941 | - @ref NEStridedSliceKernel / @ref NEStridedSlice |
Sang-Hoon Park | 7249f15 | 2021-01-22 11:55:03 +0000 | [diff] [blame] | 942 | - NEElementwiseUnaryKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 943 | - @ref NERsqrtLayer |
| 944 | - @ref NEExpLayer |
| 945 | - @ref NEReverseKernel / @ref NEReverse |
| 946 | - @ref NEArgMinMaxLayer |
| 947 | - @ref NEStackLayerKernel / @ref NEStackLayer |
| 948 | - @ref NERangeKernel / @ref NERange |
| 949 | - @ref NEPadLayer |
Georgios Pinitas | 0f7ef8a | 2021-01-10 04:23:52 +0000 | [diff] [blame] | 950 | - NEMemsetKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 951 | - @ref NEGatherKernel / @ref NEGather |
| 952 | - @ref NEElementwiseComparison |
| 953 | - @ref NEElementwiseComparisonStatic |
Sang-Hoon Park | 63001ac | 2021-01-18 14:20:27 +0000 | [diff] [blame] | 954 | - NEComparisonOperationKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 955 | - @ref NEElementwiseDivision |
| 956 | - New OpenCL kernels / functions: |
| 957 | - @ref CLSelectKernel / @ref CLSelect |
| 958 | - @ref CLTileKernel / @ref CLTile |
| 959 | - @ref CLComparisonKernel / @ref CLComparison |
| 960 | - @ref CLArgMinMaxLayer |
| 961 | - @ref CLElementwiseMax |
| 962 | - @ref CLElementwiseMin |
| 963 | - @ref CLElementwiseSquaredDiff |
| 964 | - @ref CLStackLayerKernel / @ref CLStackLayer |
| 965 | - @ref CLReverse / @ref CLReverseKernel |
| 966 | - @ref CLRsqrtLayer |
| 967 | - @ref CLExpLayer |
Michele Di Giorgio | c9c8905 | 2021-01-26 10:20:17 +0000 | [diff] [blame] | 968 | - CLElementWiseUnaryLayerKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 969 | - @ref CLGEMMReshapeLHSMatrixKernel |
| 970 | - @ref CLGEMMReshapeRHSMatrixKernel |
| 971 | - @ref CLGEMMMatrixMultiplyReshapedKernel |
| 972 | - @ref CLRangeKernel / @ref CLRange |
| 973 | - @ref CLUnstack |
| 974 | - @ref CLGatherKernel / @ref CLGather |
| 975 | - @ref CLGEMMLowpMatrixMultiplyReshapedKernel |
| 976 | - New CPP kernels / functions: |
| 977 | - @ref CPPDetectionOutputLayer |
| 978 | - @ref CPPTopKV / @ref CPPTopKVKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 979 | - Added new examples: |
| 980 | - graph_ssd_mobilenet.cpp |
| 981 | - graph_mobilenet_v2.cpp |
| 982 | - graph_resnet12.cpp |
| 983 | - graph_srcnn955.cpp |
| 984 | - graph_vgg_vdsr.cpp |
| 985 | - graph_inception_resnet_v1.cpp |
| 986 | - Add 4D tensors support to |
| 987 | - @ref NESoftmaxLayer |
| 988 | - Fused activation in @ref CLWinogradConvolutionLayer |
| 989 | - Extented @ref NEPermute to support more cases |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 990 | - Added Neon/SVE GEMM Hybrid kernels |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 991 | - Added u8 and s8 hybrid assembly kernels |
| 992 | - Introduced GEMM strategy name in NEGEMMAssemblyWrapper |
| 993 | - Improved @ref CLTuner |
| 994 | - Fused the bias addition within @ref CLGEMM |
| 995 | - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer |
| 996 | - Added NHWC data layout support to: |
| 997 | - @ref NEScale for F16 |
| 998 | - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16 |
| 999 | - @ref NEL2NormalizeLayer for FP32/FP16 |
| 1000 | - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16 |
| 1001 | - @ref CLROIAlignLayer |
Manuel Bottini | 5209be5 | 2019-02-13 16:34:56 +0000 | [diff] [blame] | 1002 | - @ref CLGenerateProposalsLayer |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 1003 | - Added QASYMM8 support to the following kernels: |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 1004 | - NEArithmeticAdditionKernel |
Isabella Gottardi | 6253897 | 2019-02-12 19:52:44 +0000 | [diff] [blame] | 1005 | - @ref NEScale |
| 1006 | - Added new tests and improved validation and benchmarking suites. |
giuros01 | a69a88b | 2019-01-31 16:29:19 +0000 | [diff] [blame] | 1007 | - Deprecated functions/interfaces |
| 1008 | - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer |
| 1009 | |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1010 | v18.11 Public major release |
| 1011 | - Various bug fixes. |
| 1012 | - Various optimisations. |
| 1013 | - New Neon kernels / functions: |
| 1014 | - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel |
| 1015 | - @ref NEReduceMean |
| 1016 | - @ref NEReorgLayer / @ref NEReorgLayerKernel |
| 1017 | - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 1018 | - NEUpsampleLayer / NEUpsampleLayerKernel |
Georgios Pinitas | 0b1c2db | 2020-12-04 15:51:34 +0000 | [diff] [blame] | 1019 | - NEYOLOLayer / NEYOLOLayerKernel |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1020 | - New OpenCL kernels / functions: |
| 1021 | - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel |
| 1022 | - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel |
Manuel Bottini | 5209be5 | 2019-02-13 16:34:56 +0000 | [diff] [blame] | 1023 | - @ref CLComputeAllAnchorsKernel |
| 1024 | - @ref CLGenerateProposalsLayer |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1025 | - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel |
| 1026 | - @ref CLReorgLayer / @ref CLReorgLayerKernel |
| 1027 | - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel |
| 1028 | - @ref CLPadLayer |
| 1029 | - @ref CLReduceMean |
| 1030 | - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel |
| 1031 | - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel |
| 1032 | - @ref CLSlice |
| 1033 | - @ref CLSplit |
| 1034 | - @ref CLStridedSlice / @ref CLStridedSliceKernel |
Georgios Pinitas | c53266e | 2020-12-09 03:11:53 +0000 | [diff] [blame] | 1035 | - CLUpsampleLayer / CLUpsampleLayerKernel |
Georgios Pinitas | 0b1c2db | 2020-12-04 15:51:34 +0000 | [diff] [blame] | 1036 | - CLYOLOLayer / CLYOLOLayerKernel |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1037 | - New CPP kernels / functions: |
| 1038 | - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel |
| 1039 | - Added the validate method in: |
| 1040 | - @ref NEDepthConvertLayer |
| 1041 | - @ref NEFloor / @ref CLFloor |
| 1042 | - @ref NEGEMMMatrixAdditionKernel |
| 1043 | - @ref NEReshapeLayer / @ref CLReshapeLayer |
| 1044 | - @ref CLScale |
| 1045 | - Added new examples: |
| 1046 | - graph_shufflenet.cpp |
| 1047 | - graph_yolov3.cpp |
| 1048 | - Added documentation for add a new function or kernel. |
| 1049 | - Improved doxygen documentation adding a list of the existing functions. |
| 1050 | - Add 4D tensors support to |
Georgios Pinitas | 09f2497 | 2019-05-17 18:14:40 +0100 | [diff] [blame] | 1051 | - CLWidthConcatenateLayer |
Georgios Pinitas | e2696b1 | 2020-12-03 20:37:43 +0000 | [diff] [blame] | 1052 | - CLFlattenLayer |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1053 | - @ref CLSoftmaxLayer |
| 1054 | - Add dot product support for @ref CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride |
| 1055 | - Add SVE support |
| 1056 | - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization |
| 1057 | - Fuses activation in @ref CLDepthwiseConvolutionLayer3x3NCHWKernel, @ref CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer |
| 1058 | - Added NHWC data layout support to: |
| 1059 | - @ref CLChannelShuffleLayer |
| 1060 | - @ref CLDeconvolutionLayer |
| 1061 | - @ref CLL2NormalizeLayer |
| 1062 | - Added QASYMM8 support to the following kernels: |
Manuel Bottini | 3b131ab | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 1063 | - CLScaleKernel |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 1064 | - NEDepthwiseConvolutionLayer3x3Kernel |
Sheri Zhang | f9ab9f9 | 2021-03-16 12:09:15 +0000 | [diff] [blame] | 1065 | - CLPixelWiseMultiplicationKernel |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1066 | - Added FP16 support to the following kernels: |
| 1067 | - @ref CLDepthwiseConvolutionLayer3x3NHWCKernel |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 1068 | - NEDepthwiseConvolutionLayer3x3Kernel |
Isabella Gottardi | 8773d7c | 2018-11-20 09:56:46 +0000 | [diff] [blame] | 1069 | - @ref CLNormalizePlanarYUVLayerKernel |
| 1070 | - @ref CLWinogradConvolutionLayer (5x5 kernel) |
| 1071 | - More tests added to both validation and benchmarking suites. |
| 1072 | |
Anthony Barbier | d51ea0a | 2018-08-07 17:48:03 +0100 | [diff] [blame] | 1073 | v18.08 Public major release |
| 1074 | - Various bug fixes. |
Michele Di Giorgio | 02baf01 | 2018-08-20 18:10:38 +0100 | [diff] [blame] | 1075 | - Various optimisations. |
Anthony Barbier | d51ea0a | 2018-08-07 17:48:03 +0100 | [diff] [blame] | 1076 | - Updated recommended NDK version to r17b. |
Michele Di Giorgio | 02baf01 | 2018-08-20 18:10:38 +0100 | [diff] [blame] | 1077 | - Removed support for QS8/QS16 data types. |
| 1078 | - Added support for grouped convolution in @ref CLConvolutionLayer. |
| 1079 | - Added NHWC data layout support to: |
Georgios Pinitas | 09f2497 | 2019-05-17 18:14:40 +0100 | [diff] [blame] | 1080 | - NEDepthConcatenateLayer / CLDepthConcatenateLayer |
Michele Di Giorgio | 02baf01 | 2018-08-20 18:10:38 +0100 | [diff] [blame] | 1081 | - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer |
| 1082 | - @ref CLDepthwiseConvolutionLayer |
| 1083 | - @ref CLDirectConvolutionLayer |
| 1084 | - @ref CLConvolutionLayer |
| 1085 | - @ref CLScale |
| 1086 | - @ref CLIm2ColKernel |
| 1087 | - New Neon kernels / functions: |
| 1088 | - @ref NERNNLayer |
| 1089 | - New OpenCL kernels / functions: |
| 1090 | - @ref CLArithmeticDivision |
| 1091 | - Introduced prepare() stage support in the graph API for GLES. |
| 1092 | - Added support for memory reusage when trying to allocate smaller CLTensors. |
| 1093 | - Enabled NHWC execution on graph examples. |
| 1094 | - Added JPEG accessor for validation purposes. |
| 1095 | - Added validate methods to some kernels / functions. |
Anthony Barbier | d51ea0a | 2018-08-07 17:48:03 +0100 | [diff] [blame] | 1096 | |
| 1097 | v18.05 Public major release |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1098 | - Various bug fixes. |
| 1099 | - Various optimisations. |
Pablo Tello | eb82fd2 | 2018-02-23 13:43:50 +0000 | [diff] [blame] | 1100 | - Major redesign in the interface for the neon kernels implemented in assembly. |
| 1101 | - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel |
| 1102 | - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in neon functions. |
| 1103 | - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1104 | - Moved neon assembly kernels to the folder src/core/Neon/kernels/arm_gemm. |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1105 | - Improved doxygen documentation. |
| 1106 | - Improved memory management for layer's transitions. |
| 1107 | - Added support for NHWC data layout in tensors. |
| 1108 | - Added NHWC data layout support to: |
| 1109 | - @ref NEGEMMConvolutionLayer |
| 1110 | - @ref NEDirectConvolutionLayer |
| 1111 | - @ref NEPoolingLayer / @ref CLPoolingLayer |
| 1112 | - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer |
| 1113 | - @ref NEDepthwiseConvolutionLayer |
| 1114 | - @ref NEScale |
Georgios Pinitas | f7c5a41 | 2020-12-03 14:38:33 +0000 | [diff] [blame] | 1115 | - NEIm2Col |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1116 | - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer. |
| 1117 | - New OpenCL kernels / functions: |
| 1118 | - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel |
| 1119 | - @ref CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 1120 | - @ref CLCopy / CLCopyKernel |
Anthony Barbier | 38e7f1f | 2018-05-21 13:37:47 +0100 | [diff] [blame] | 1121 | - @ref CLLSTMLayer |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1122 | - @ref CLRNNLayer |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 1123 | - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1124 | - @ref CLWinogradFilterTransformKernel / @ref CLWinogradInputTransformKernel / @ref CLWinogradConvolutionLayer |
| 1125 | - @ref CLWinogradInputTransformKernel / @ref CLWinogradInputTransform |
| 1126 | - New Neon kernels / functions: |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1127 | - @ref NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights. |
| 1128 | - Created the validate method in @ref CLDepthwiseConvolutionLayer. |
| 1129 | - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer. |
| 1130 | - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer. |
| 1131 | - Added broadcast multiply support in @ref NEPixelWiseMultiplication / @ref NEPixelWiseMultiplicationKernel. |
| 1132 | - Port mobilenet example to NHWC data layout. |
| 1133 | - Enabled Winograd method in @ref CLConvolutionLayer. |
| 1134 | - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1135 | - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm. |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1136 | - Added memory manager support in GLES functions. |
| 1137 | - Major refactoring of the graph API. |
| 1138 | - Added GLES backend in the graph API. |
| 1139 | - Added support for the memory manager in the graph API. |
| 1140 | - Enabled Winograd Convolution method in the graph API. |
| 1141 | - Added support for grouped convolutions in the graph API. |
Manuel Bottini | 10b3826 | 2021-02-19 18:16:44 +0000 | [diff] [blame] | 1142 | - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer. |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1143 | - Added fast maths flag in @ref CLConvolutionLayer. |
| 1144 | - Added new tests and benchmarks in validation and benchmark frameworks |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1145 | - Merge Activation layer with Convolution Layer (Neon. CL, GLES) |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1146 | - Added support to OpenCL 2.0 SVM |
| 1147 | - Added support to import memory in OpenCL tensors. |
| 1148 | - Added the prepare() method to perform any one off pre-processing before running the function. |
| 1149 | - Added new examples: |
| 1150 | - graph_inception_v4.cpp |
Anthony Barbier | 38e7f1f | 2018-05-21 13:37:47 +0100 | [diff] [blame] | 1151 | - graph_resnext50.cpp |
Pablo Tello | b5cc95b | 2018-05-15 11:49:33 +0100 | [diff] [blame] | 1152 | - Added memory measurement instrument for CL. |
Pablo Tello | eb82fd2 | 2018-02-23 13:43:50 +0000 | [diff] [blame] | 1153 | |
Anthony Barbier | 577fbdf | 2018-03-01 15:17:54 +0000 | [diff] [blame] | 1154 | v18.03 Public maintenance release |
| 1155 | - Various bug fixes. |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1156 | - Fixed bug in @ref NEActivationLayer |
| 1157 | - Fix in @ref CLTuner when using batches. |
Anthony Barbier | 577fbdf | 2018-03-01 15:17:54 +0000 | [diff] [blame] | 1158 | - Updated recommended NDK version to r16b (And fixed warnings). |
| 1159 | - Fixed bug in validation code. |
| 1160 | - Added Inception v4 graph example. |
Georgios Pinitas | 9fb1159 | 2018-04-26 20:34:58 +0100 | [diff] [blame] | 1161 | - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer |
Anthony Barbier | 577fbdf | 2018-03-01 15:17:54 +0000 | [diff] [blame] | 1162 | |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1163 | v18.02 Public major release |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1164 | - Various Neon / OpenCL / GLES optimisations. |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1165 | - Various bug fixes. |
| 1166 | - Changed default number of threads on big LITTLE systems. |
| 1167 | - Refactored examples and added: |
| 1168 | - graph_mobilenet_qassym8 |
| 1169 | - graph_resnet |
| 1170 | - graph_squeezenet_v1_1 |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1171 | - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method. |
| 1172 | - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method. |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1173 | - Added in place support to: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1174 | - @ref CLActivationLayer |
| 1175 | - @ref CLBatchNormalizationLayer |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1176 | - Added QASYMM8 support to: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1177 | - @ref CLActivationLayer |
| 1178 | - @ref CLDepthwiseConvolutionLayer |
| 1179 | - @ref NEDepthwiseConvolutionLayer |
| 1180 | - @ref NESoftmaxLayer |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1181 | - Added FP16 support to: |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 1182 | - CLDepthwiseConvolutionLayer3x3 |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1183 | - @ref CLDepthwiseConvolutionLayer |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 1184 | - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1185 | - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer |
| 1186 | - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1187 | - New OpenCL kernels / functions: |
Michele Di Giorgio | a046e16 | 2019-10-08 09:36:26 +0100 | [diff] [blame] | 1188 | - CLDirectConvolutionLayerOutputStageKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1189 | - New Neon kernels / functions |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1190 | - Added name() method to all kernels. |
| 1191 | - Added support for Winograd 5x5. |
Georgios Pinitas | 0f7ef8a | 2021-01-10 04:23:52 +0000 | [diff] [blame] | 1192 | - NEPermuteKernel / @ref NEPermute |
Georgios Pinitas | 9fb1159 | 2018-04-26 20:34:58 +0100 | [diff] [blame] | 1193 | - @ref NEWinogradLayerTransformInputKernel / NEWinogradLayer |
| 1194 | - @ref NEWinogradLayerTransformOutputKernel / NEWinogradLayer |
| 1195 | - @ref NEWinogradLayerTransformWeightsKernel / NEWinogradLayer |
Anthony Barbier | e155337 | 2018-07-16 18:53:52 +0100 | [diff] [blame] | 1196 | - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1197 | - New GLES kernels / functions: |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1198 | - GCTensorShiftKernel / GCTensorShift |
Pablo Tello | f6c572c | 2018-02-14 12:47:30 +0000 | [diff] [blame] | 1199 | |
Anthony Barbier | 64c95a0 | 2018-01-22 18:48:55 +0000 | [diff] [blame] | 1200 | v18.01 Public maintenance release |
| 1201 | - Various bug fixes |
| 1202 | - Added some of the missing validate() methods |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1203 | - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 1204 | - Added CLPermuteKernel / @ref CLPermute |
Anthony Barbier | 64c95a0 | 2018-01-22 18:48:55 +0000 | [diff] [blame] | 1205 | - Added method to clean the programs cache in the CL Kernel library. |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1206 | - Added GCArithmeticAdditionKernel / GCArithmeticAddition |
| 1207 | - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3 |
| 1208 | - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer |
| 1209 | - Added GCScaleKernel / GCScale |
| 1210 | - Added GCWeightsReshapeKernel / GCConvolutionLayer |
Anthony Barbier | 64c95a0 | 2018-01-22 18:48:55 +0000 | [diff] [blame] | 1211 | - Added FP16 support to the following GLES compute kernels: |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1212 | - GCCol2ImKernel |
| 1213 | - GCGEMMInterleave4x4Kernel |
| 1214 | - GCGEMMTranspose1xWKernel |
| 1215 | - GCIm2ColKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1216 | - Refactored Neon Winograd (NEWinogradLayerKernel) |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1217 | - Added @ref NEDirectConvolutionLayerOutputStageKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1218 | - Added QASYMM8 support to the following Neon kernels: |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 1219 | - NEDepthwiseConvolutionLayer3x3Kernel |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1220 | - @ref NEFillBorderKernel |
Michele Di Giorgio | 1928904 | 2021-02-03 16:05:00 +0000 | [diff] [blame] | 1221 | - NEPoolingLayerKernel |
Anthony Barbier | 64c95a0 | 2018-01-22 18:48:55 +0000 | [diff] [blame] | 1222 | - Added new examples: |
| 1223 | - graph_cl_mobilenet_qasymm8.cpp |
| 1224 | - graph_inception_v3.cpp |
| 1225 | - gc_dc.cpp |
| 1226 | - More tests added to both validation and benchmarking suites. |
| 1227 | |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1228 | v17.12 Public major release |
| 1229 | - Most machine learning functions on OpenCL support the new data type QASYMM8 |
| 1230 | - Introduced logging interface |
| 1231 | - Introduced opencl timer |
| 1232 | - Reworked GEMMLowp interface |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1233 | - Added new Neon assembly kernels for GEMMLowp, SGEMM and HGEMM |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1234 | - Added validation method for most Machine Learning kernels / functions |
| 1235 | - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19 |
| 1236 | - Added sgemm example for OpenCL |
| 1237 | - Added absolute difference example for GLES compute |
| 1238 | - Added new tests and benchmarks in validation and benchmark frameworks |
| 1239 | - Added new kernels / functions for GLES compute |
| 1240 | |
| 1241 | - New OpenGL ES kernels / functions |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1242 | - GCAbsoluteDifferenceKernel / GCAbsoluteDifference |
| 1243 | - GCActivationLayerKernel / GCActivationLayer |
| 1244 | - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer |
| 1245 | - GCCol2ImKernel |
| 1246 | - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer |
| 1247 | - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer |
| 1248 | - GCDropoutLayerKernel / GCDropoutLayer |
| 1249 | - GCFillBorderKernel / GCFillBorder |
| 1250 | - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4 |
| 1251 | - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM |
| 1252 | - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW |
| 1253 | - GCIm2ColKernel |
| 1254 | - GCNormalizationLayerKernel / GCNormalizationLayer |
| 1255 | - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication |
| 1256 | - GCPoolingLayerKernel / GCPoolingLayer |
| 1257 | - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer |
| 1258 | - GCTransposeKernel / GCTranspose |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1259 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1260 | - New Neon kernels / functions |
Pablo Tello | eb82fd2 | 2018-02-23 13:43:50 +0000 | [diff] [blame] | 1261 | - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore |
| 1262 | - arm_compute::NEHGEMMAArch64FP16Kernel |
Georgios Pinitas | 7d0adc6 | 2020-09-04 15:25:24 +0100 | [diff] [blame] | 1263 | - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1264 | - @ref NEGEMMLowpOffsetContributionKernel / @ref NEGEMMLowpMatrixAReductionKernel / @ref NEGEMMLowpMatrixBReductionKernel / @ref NEGEMMLowpMatrixMultiplyCore |
| 1265 | - @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint |
Georgios Pinitas | 9fb1159 | 2018-04-26 20:34:58 +0100 | [diff] [blame] | 1266 | - NEWinogradLayer / NEWinogradLayerKernel |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1267 | |
| 1268 | - New OpenCL kernels / functions |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1269 | - @ref CLGEMMLowpOffsetContributionKernel / @ref CLGEMMLowpMatrixAReductionKernel / @ref CLGEMMLowpMatrixBReductionKernel / @ref CLGEMMLowpMatrixMultiplyCore |
Michele Di Giorgio | ba14c92 | 2020-10-12 13:27:57 +0100 | [diff] [blame] | 1270 | - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / @ref CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1271 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1272 | - New graph nodes for Neon and OpenCL |
Georgios Pinitas | d9eb275 | 2018-04-03 13:44:29 +0100 | [diff] [blame] | 1273 | - graph::BranchLayer |
| 1274 | - graph::DepthConvertLayer |
| 1275 | - graph::DepthwiseConvolutionLayer |
| 1276 | - graph::DequantizationLayer |
| 1277 | - graph::FlattenLayer |
| 1278 | - graph::QuantizationLayer |
| 1279 | - graph::ReshapeLayer |
Gian Marco | ff85093 | 2017-12-11 12:37:17 +0000 | [diff] [blame] | 1280 | |
Anthony Barbier | 3c5b4ff | 2017-10-12 13:20:52 +0100 | [diff] [blame] | 1281 | v17.10 Public maintenance release |
| 1282 | - Bug fixes: |
| 1283 | - Check the maximum local workgroup size supported by OpenCL devices |
| 1284 | - Minor documentation updates (Fixed instructions to build the examples) |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1285 | - Introduced a graph::GraphContext |
Anthony Barbier | 3c5b4ff | 2017-10-12 13:20:52 +0100 | [diff] [blame] | 1286 | - Added a few new Graph nodes, support for branches and grouping. |
| 1287 | - Automatically enable cl_printf in debug builds |
| 1288 | - Fixed bare metal builds for armv7a |
| 1289 | - Added AlexNet and cartoon effect examples |
| 1290 | - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute) |
| 1291 | |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1292 | v17.09 Public major release |
| 1293 | - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers. |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1294 | - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager) |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1295 | - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework). |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1296 | - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Neon and OpenCL. |
| 1297 | - New Neon kernels / functions: |
Pablo Tello | eb82fd2 | 2018-02-23 13:43:50 +0000 | [diff] [blame] | 1298 | - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel |
Manuel Bottini | 00f4dfc | 2021-03-10 09:55:14 +0000 | [diff] [blame] | 1299 | - NEDequantizationLayerKernel / @ref NEDequantizationLayer |
Georgios Pinitas | 70eb53b | 2021-01-06 19:42:21 +0000 | [diff] [blame] | 1300 | - NEFloorKernel / @ref NEFloor |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1301 | - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer |
Manuel Bottini | 0ded4c4 | 2021-03-09 14:15:27 +0000 | [diff] [blame] | 1302 | - NEQuantizationLayerKernel @ref NEMinMaxLayerKernel / @ref NEQuantizationLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1303 | - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer |
| 1304 | - @ref NEReductionOperationKernel / @ref NEReductionOperation |
Georgios Pinitas | 0f7ef8a | 2021-01-10 04:23:52 +0000 | [diff] [blame] | 1305 | - NEReshapeLayerKernel / @ref NEReshapeLayer |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1306 | |
| 1307 | - New OpenCL kernels / functions: |
Manuel Bottini | 387259a | 2020-05-21 17:14:36 +0100 | [diff] [blame] | 1308 | - @ref CLDepthwiseConvolutionLayer3x3NCHWKernel @ref CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer |
Manuel Bottini | 9e73c93 | 2021-03-02 17:40:42 +0000 | [diff] [blame] | 1309 | - CLDequantizationLayerKernel / CLDequantizationLayer |
Sheri Zhang | 1efed92 | 2021-03-10 22:43:38 +0000 | [diff] [blame] | 1310 | - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer |
Georgios Pinitas | e2696b1 | 2020-12-03 20:37:43 +0000 | [diff] [blame] | 1311 | - CLFlattenLayer |
Georgios Pinitas | f47f718 | 2021-01-15 09:29:50 +0000 | [diff] [blame] | 1312 | - CLFloorKernel / @ref CLFloor |
Gian Marco Iodice | 5fc07aa | 2019-05-15 17:08:02 +0100 | [diff] [blame] | 1313 | - CLGEMMTranspose1xW |
Michele Di Giorgio | ee82d34 | 2021-01-05 16:14:28 +0000 | [diff] [blame] | 1314 | - CLGEMMMatrixVectorMultiplyKernel |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1315 | - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer |
Manuel Bottini | 5a1bf62 | 2021-03-01 17:39:36 +0000 | [diff] [blame] | 1316 | - CLQuantizationLayerKernel @ref CLMinMaxLayerKernel / @ref CLQuantizationLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1317 | - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer |
| 1318 | - @ref CLReductionOperationKernel / @ref CLReductionOperation |
Sheri Zhang | 7e20e29 | 2021-02-02 11:49:34 +0000 | [diff] [blame] | 1319 | - CLReshapeLayerKernel / @ref CLReshapeLayer |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1320 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1321 | v17.06 Public major release |
| 1322 | - Various bug fixes |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1323 | - Added support for fixed point 8 bit (QS8) to the various Neon machine learning kernels. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1324 | - Added unit tests and benchmarks (AlexNet, LeNet) |
| 1325 | - Added support for sub tensors. |
| 1326 | - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1327 | - Added @ref OMPScheduler (OpenMP) scheduler for Neon |
| 1328 | - Added @ref SingleThreadScheduler scheduler for Neon (For bare metal) |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1329 | - User can specify his own scheduler by implementing the @ref IScheduler interface. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1330 | - New OpenCL kernels / functions: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1331 | - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer |
Michele Di Giorgio | 7d61ff0 | 2021-01-18 21:15:59 +0000 | [diff] [blame] | 1332 | - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1333 | - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection |
Georgios Pinitas | 96b16b6 | 2020-12-01 17:41:34 +0000 | [diff] [blame] | 1334 | - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1335 | - @ref CLWeightsReshapeKernel / @ref CLConvolutionLayerReshapeWeights |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1336 | - New C++ kernels: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1337 | - @ref CPPDetectionWindowNonMaximaSuppressionKernel |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1338 | - New Neon kernels / functions: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1339 | - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 1340 | - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1341 | - @ref NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer |
Georgios Pinitas | 96b16b6 | 2020-12-01 17:41:34 +0000 | [diff] [blame] | 1342 | - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1343 | - @ref NEWeightsReshapeKernel / @ref NEConvolutionLayerReshapeWeights |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1344 | |
| 1345 | v17.05 Public bug fixes release |
| 1346 | - Various bug fixes |
| 1347 | - Remaining of the functions ported to use accurate padding. |
| 1348 | - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available). |
| 1349 | - Added "free" method to allocator. |
| 1350 | - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9 |
| 1351 | |
| 1352 | v17.04 Public bug fixes release |
| 1353 | |
| 1354 | The following functions have been ported to use the new accurate padding: |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1355 | - CLColorConvertKernel |
| 1356 | - CLEdgeNonMaxSuppressionKernel |
| 1357 | - CLEdgeTraceKernel |
| 1358 | - CLGaussianPyramidHorKernel |
| 1359 | - CLGaussianPyramidVertKernel |
| 1360 | - CLGradientKernel |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 1361 | - NEChannelCombineKernel |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1362 | - @ref NEFillArrayKernel |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 1363 | - NEGaussianPyramidHorKernel |
| 1364 | - NEGaussianPyramidVertKernel |
Georgios Pinitas | 09d3451 | 2018-08-30 16:02:11 +0100 | [diff] [blame] | 1365 | - NEHarrisScoreFP16Kernel |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 1366 | - NEHarrisScoreKernel |
| 1367 | - NEHOGDetectorKernel |
Michalis Spyrou | 373b407 | 2021-01-20 16:41:12 +0000 | [diff] [blame] | 1368 | - NELogits1DMaxKernel |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1369 | - NELogits1DShiftExpSumKernel |
| 1370 | - NELogits1DNormKernel |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1371 | - NENonMaximaSuppression3x3FP16Kernel |
| 1372 | - NENonMaximaSuppression3x3Kernel |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1373 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1374 | v17.03.1 First Major public release of the sources |
| 1375 | - Renamed the library to arm_compute |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1376 | - New CPP target introduced for C++ kernels shared between Neon and CL functions. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1377 | - New padding calculation interface introduced and ported most kernels / functions to use it. |
| 1378 | - New OpenCL kernels / functions: |
Gian Marco Iodice | eb65f6d | 2020-04-15 11:42:15 +0100 | [diff] [blame] | 1379 | - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1380 | - New Neon kernels / functions: |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1381 | - @ref NENormalizationLayerKernel / @ref NENormalizationLayer |
Teresa Charlin | d1dc09c | 2021-03-04 15:24:45 +0000 | [diff] [blame] | 1382 | - NETransposeKernel / @ref NETranspose |
Michalis Spyrou | 373b407 | 2021-01-20 16:41:12 +0000 | [diff] [blame] | 1383 | - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1384 | - @ref NEIm2ColKernel, @ref NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer |
Michele Di Giorgio | f22f672 | 2020-07-03 16:29:24 +0100 | [diff] [blame] | 1385 | - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1386 | - @ref NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1387 | |
| 1388 | v17.03 Sources preview |
| 1389 | - New OpenCL kernels / functions: |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1390 | - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge |
Gian Marco Iodice | 57a8961 | 2019-08-22 14:10:27 +0100 | [diff] [blame] | 1391 | - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, @ref CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM |
Michele Di Giorgio | f6f7876 | 2020-07-06 11:27:21 +0100 | [diff] [blame] | 1392 | - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1393 | - @ref CLTransposeKernel / @ref CLTranspose |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1394 | - @ref CLLKTrackerInitKernel, @ref CLLKTrackerStage0Kernel, @ref CLLKTrackerStage1Kernel, @ref CLLKTrackerFinalizeKernel / CLOpticalFlow |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1395 | - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1396 | - CLLaplacianPyramid, CLLaplacianReconstruct |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1397 | - New Neon kernels / functions: |
Michele Di Giorgio | bd2c8e1 | 2021-01-19 15:29:02 +0000 | [diff] [blame] | 1398 | - NEActivationLayerKernel / @ref NEActivationLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1399 | - GEMM refactoring + FP16 support (Requires armv8.2 CPU): @ref NEGEMMInterleave4x4Kernel, @ref NEGEMMTranspose1xWKernel, @ref NEGEMMMatrixMultiplyKernel, @ref NEGEMMMatrixAdditionKernel / @ref NEGEMM |
Michele Di Giorgio | 1928904 | 2021-02-03 16:05:00 +0000 | [diff] [blame] | 1400 | - NEPoolingLayerKernel / @ref NEPoolingLayer |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1401 | |
| 1402 | v17.02.1 Sources preview |
| 1403 | - New OpenCL kernels / functions: |
Sang-Hoon Park | 201e0fe | 2021-01-27 13:14:56 +0000 | [diff] [blame] | 1404 | - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer |
Michele Di Giorgio | e131466 | 2021-02-01 17:09:32 +0000 | [diff] [blame] | 1405 | - CLPoolingLayerKernel / @ref CLPoolingLayer |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1406 | - @ref CLIm2ColKernel, @ref CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1407 | - @ref CLRemapKernel / @ref CLRemap |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1408 | - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb |
| 1409 | - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation |
| 1410 | - CLNonLinearFilterKernel / CLNonLinearFilter |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1411 | - New Neon FP16 kernels (Requires armv8.2 CPU) |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 1412 | - NEAccumulateWeightedFP16Kernel |
| 1413 | - NEBox3x3FP16Kernel |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1414 | - NENonMaximaSuppression3x3FP16Kernel |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1415 | |
| 1416 | v17.02 Sources preview |
| 1417 | - New OpenCL kernels / functions: |
Georgios Pinitas | f47f718 | 2021-01-15 09:29:50 +0000 | [diff] [blame] | 1418 | - CLActivationLayerKernel / @ref CLActivationLayer |
Michalis Spyrou | 473cb01 | 2021-02-23 11:48:12 +0000 | [diff] [blame] | 1419 | - CLChannelCombineKernel / CLChannelCombine |
| 1420 | - CLDerivativeKernel / CLChannelExtract |
| 1421 | - CLFastCornersKernel / CLFastCorners |
| 1422 | - CLMeanStdDevKernel / CLMeanStdDev |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1423 | - New Neon kernels / functions: |
Michalis Spyrou | 27e67f0 | 2021-02-16 11:34:39 +0000 | [diff] [blame] | 1424 | - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection |
| 1425 | - NENonLinearFilterKernel / NENonLinearFilter |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1426 | - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events. |
| 1427 | - Switched all the kernels / functions to use tensors instead of images. |
| 1428 | - Updated documentation to include instructions to build the library from sources. |
| 1429 | |
| 1430 | v16.12 Binary preview release |
| 1431 | - Original release |
| 1432 | |
| 1433 | @section S3_how_to_build How to build the library and the examples |
| 1434 | |
| 1435 | @subsection S3_1_build_options Build options |
| 1436 | |
| 1437 | scons 2.3 or above is required to build the library. |
| 1438 | To see the build options available simply run ```scons -h```: |
| 1439 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1440 | debug: Debug (yes|no) |
| 1441 | default: False |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1442 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1443 | asserts: Enable asserts (this flag is forced to 1 for debug=1) (yes|no) |
| 1444 | default: False |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1445 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1446 | logging: Logging (this flag is forced to 1 for debug=1) (yes|no) |
| 1447 | default: False |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1448 | |
Sang-Hoon Park | 50e98bb | 2021-01-14 14:54:14 +0000 | [diff] [blame] | 1449 | arch: Target Architecture (armv7a|arm64-v8a|arm64-v8.2-a|arm64-v8.2-a-sve|arm64-v8.2-a-sve2|x86_32|x86_64|armv8a|armv8.2-a|armv8.2-a-sve|armv8.6-a|armv8.6-a-sve|armv8.6-a-sve2|armv8r64|x86) |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1450 | default: armv7a |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1451 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1452 | estate: Execution State (auto|32|64) |
| 1453 | default: auto |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1454 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1455 | os: Target OS (linux|android|macos|tizen|bare_metal) |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1456 | default: linux |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1457 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1458 | build: Build type (native|cross_compile|embed_only) |
| 1459 | default: cross_compile |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1460 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1461 | examples: Build example programs (yes|no) |
| 1462 | default: True |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1463 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1464 | gemm_tuner: Build gemm_tuner programs (yes|no) |
| 1465 | default: True |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1466 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1467 | Werror: Enable/disable the -Werror compilation flag (yes|no) |
| 1468 | default: True |
Anthony Barbier | 20dbb82 | 2017-12-13 21:19:39 +0000 | [diff] [blame] | 1469 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1470 | standalone: Builds the tests as standalone executables, links statically with libgcc, libstdc++ and libarm_compute (yes|no) |
| 1471 | default: False |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1472 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1473 | opencl: Enable OpenCL support (yes|no) |
| 1474 | default: True |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1475 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1476 | neon: Enable Neon support (yes|no) |
| 1477 | default: False |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1478 | |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1479 | embed_kernels: Embed OpenCL kernels in library binary (yes|no) |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1480 | default: True |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1481 | |
Georgios Pinitas | ea85727 | 2021-01-22 05:47:37 +0000 | [diff] [blame] | 1482 | compress_kernels: Compress embedded OpenCL kernels in library binary. Note embed_kernels should be enabled as well (yes|no) |
| 1483 | default: False |
Georgios Pinitas | ea85727 | 2021-01-22 05:47:37 +0000 | [diff] [blame] | 1484 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1485 | set_soname: Set the library's soname and shlibversion (requires SCons 2.4 or above) (yes|no) |
| 1486 | default: False |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1487 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1488 | tracing: Enable runtime tracing (yes|no) |
| 1489 | default: False |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1490 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1491 | openmp: Enable OpenMP backend (yes|no) |
| 1492 | default: False |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1493 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1494 | cppthreads: Enable C++11 threads backend (yes|no) |
| 1495 | default: True |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1496 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1497 | build_dir: Specify sub-folder for the build ( /path/to/build_dir ) |
| 1498 | default: . |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1499 | |
| 1500 | install_dir: Specify sub-folder for the install ( /path/to/install_dir ) |
| 1501 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1502 | |
| 1503 | exceptions: Enable/disable C++ exception support (yes|no) |
| 1504 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1505 | |
| 1506 | linker_script: Use an external linker script ( /path/to/linker_script ) |
| 1507 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1508 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1509 | custom_options: Custom options that can be used to turn on/off features |
| 1510 | (all|none|comma-separated list of names) |
| 1511 | allowed names: disable_mmla_fp |
| 1512 | default: none |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1513 | |
| 1514 | data_type_support: Enable a list of data types to support |
| 1515 | (all|none|comma-separated list of names) |
| 1516 | allowed names: qasymm8 qasymm8_signed qsymm16 fp16 fp32 |
| 1517 | default: all |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1518 | |
| 1519 | toolchain_prefix: Override the toolchain prefix |
| 1520 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1521 | |
| 1522 | compiler_prefix: Override the compiler prefix |
| 1523 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1524 | |
| 1525 | extra_cxx_flags: Extra CXX flags to be appended to the build command |
| 1526 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1527 | |
| 1528 | extra_link_flags: Extra LD flags to be appended to the build command |
| 1529 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1530 | |
| 1531 | compiler_cache: Command to prefix to the C and C++ compiler (e.g ccache) |
| 1532 | default: |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1533 | |
| 1534 | specs_file: Specs file to use |
| 1535 | default: rdimon.specs |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1536 | |
| 1537 | benchmark_examples: Build benchmark examples programs (yes|no) |
| 1538 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1539 | |
| 1540 | validate_examples: Build validate examples programs (yes|no) |
| 1541 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1542 | |
| 1543 | reference_openmp: Build reference validation with openmp (yes|no) |
| 1544 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1545 | |
| 1546 | validation_tests: Build validation test programs (yes|no) |
| 1547 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1548 | |
| 1549 | benchmark_tests: Build benchmark test programs (yes|no) |
| 1550 | default: True |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1551 | |
| 1552 | test_filter: Pattern to specify the tests' filenames to be compiled |
| 1553 | default: *.cpp |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1554 | |
| 1555 | pmu: Enable PMU counters (yes|no) |
| 1556 | default: False |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1557 | |
| 1558 | mali: Enable Mali hardware counters (yes|no) |
| 1559 | default: False |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1560 | |
Michele Di Giorgio | 72610dc | 2020-11-18 15:29:08 +0000 | [diff] [blame] | 1561 | external_tests_dir: Add examples, benchmarks and tests to the tests suite from an external path ( /path/to/external_tests_dir ) |
| 1562 | default: |
Michele Di Giorgio | 72610dc | 2020-11-18 15:29:08 +0000 | [diff] [blame] | 1563 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1564 | @b debug / @b asserts: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1565 | - With debug=1 asserts are enabled, and the library is built with symbols and no optimisations enabled. |
| 1566 | - With debug=0 and asserts=1: Optimisations are enabled and symbols are removed, however all the asserts are still present (This is about 20% slower than the release build) |
| 1567 | - With debug=0 and asserts=0: All optimisations are enable and no validation is performed, if the application misuses the library it is likely to result in a crash. (Only use this mode once you are sure your application is working as expected). |
| 1568 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1569 | @b arch: The x86_32 and x86_64 targets can only be used with neon=0 and opencl=1. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1570 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1571 | @b os: Choose the operating system you are targeting: Linux, Android or bare metal. |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1572 | @note bare metal can only be used for Neon (not OpenCL), only static libraries get built and Neon's multi-threading support is disabled. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1573 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1574 | @b build: you can either build directly on your device (native) or cross compile from your desktop machine (cross-compile). In both cases make sure the compiler is available in your path. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1575 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1576 | @note If you want to natively compile for 32bit on a 64bit Arm device running a 64bit OS then you will have to use cross-compile too. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1577 | |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1578 | There is also an 'embed_only' option which will generate all the .embed files for the OpenCL kernels. This might be useful if using a different build system to compile the library. |
Anthony Barbier | 2d0ce77 | 2018-02-21 15:35:36 +0000 | [diff] [blame] | 1579 | |
Georgios Pinitas | ea85727 | 2021-01-22 05:47:37 +0000 | [diff] [blame] | 1580 | In addittion the option 'compress_kernels' will compress the embedded OpenCL kernel files using zlib and inject them in the library. This is useful for reducing the binary size. Note, this option is only available for Android when 'embed_kernels' is enabled. |
| 1581 | |
Michele Di Giorgio | eca54a0 | 2021-02-16 15:37:59 +0000 | [diff] [blame] | 1582 | @b Werror: If you are compiling using the same toolchains as the ones used in this guide then there shouldn't be any warning and therefore you should be able to keep Werror=1. If with a different compiler version the library fails to build because of warnings interpreted as errors then, if you are sure the warnings are not important, you might want to try to build with Werror=0 (But please do report the issue on Github). |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1583 | |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1584 | @b opencl / @b neon: Choose which SIMD technology you want to target. (Neon for Arm Cortex-A CPUs or OpenCL for Arm Mali GPUs) |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1585 | |
Manuel Bottini | ceaa0bf | 2021-02-16 15:15:19 +0000 | [diff] [blame] | 1586 | @b embed_kernels: For OpenCL only: set embed_kernels=1 if you want the OpenCL kernels to be built in the library's binaries instead of being read from separate ".cl" / ".cs" files. If embed_kernels is set to 0 then the application can set the path to the folder containing the OpenCL kernel files by calling CLKernelLibrary::init(). By default the path is set to "./cl_kernels". |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1587 | |
| 1588 | @b set_soname: Do you want to build the versioned version of the library ? |
| 1589 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1590 | If enabled the library will contain a SONAME and SHLIBVERSION and some symlinks will automatically be created between the objects. |
| 1591 | Example: |
| 1592 | libarm_compute_core.so -> libarm_compute_core.so.1.0.0 |
| 1593 | libarm_compute_core.so.1 -> libarm_compute_core.so.1.0.0 |
| 1594 | libarm_compute_core.so.1.0.0 |
| 1595 | |
| 1596 | @note This options is disabled by default as it requires SCons version 2.4 or above. |
| 1597 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1598 | @b extra_cxx_flags: Custom CXX flags which will be appended to the end of the build command. |
| 1599 | |
| 1600 | @b build_dir: Build the library in a subfolder of the "build" folder. (Allows to build several configurations in parallel). |
| 1601 | |
| 1602 | @b examples: Build or not the examples |
| 1603 | |
| 1604 | @b validation_tests: Enable the build of the validation suite. |
| 1605 | |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1606 | @b benchmark_tests: Enable the build of the benchmark tests |
| 1607 | |
| 1608 | @b pmu: Enable the PMU cycle counter to measure execution time in benchmark tests. (Your device needs to support it) |
| 1609 | |
Anthony Barbier | 6a5627a | 2017-09-26 14:42:02 +0100 | [diff] [blame] | 1610 | @b mali: Enable the collection of Mali hardware counters to measure execution time in benchmark tests. (Your device needs to have a Mali driver that supports it) |
| 1611 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1612 | @b openmp Build in the OpenMP scheduler for Neon. |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1613 | |
| 1614 | @note Only works when building with g++ not clang++ |
| 1615 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1616 | @b cppthreads Build in the C++11 scheduler for Neon. |
Anthony Barbier | 79c6178 | 2017-06-23 11:48:24 +0100 | [diff] [blame] | 1617 | |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1618 | @sa Scheduler::set |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1619 | |
Michele Di Giorgio | 72610dc | 2020-11-18 15:29:08 +0000 | [diff] [blame] | 1620 | @b external_tests_dir Add examples, benchmarks and tests to the tests suite from an external path ( /path/to/external_tests_dir ) |
| 1621 | |
| 1622 | In order to use this option, the external tests directory must have the following structure: |
| 1623 | |
| 1624 | EXTERNAL_TESTS_DIR: |
| 1625 | └── tests |
| 1626 | ├── benchmark |
| 1627 | │ ├── CL |
| 1628 | │ ├── datasets |
| 1629 | │ ├── fixtures |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1630 | │ └── Neon |
Michele Di Giorgio | 72610dc | 2020-11-18 15:29:08 +0000 | [diff] [blame] | 1631 | └── validation |
| 1632 | ├── CL |
| 1633 | ├── datasets |
| 1634 | ├── fixtures |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1635 | └── Neon |
Michele Di Giorgio | 72610dc | 2020-11-18 15:29:08 +0000 | [diff] [blame] | 1636 | |
| 1637 | Then, build the library with `external_tests_dir=<PATH_TO_EXTERNAL_TESTS_DIR>`. |
| 1638 | |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1639 | @subsection S3_2_linux Building for Linux |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1640 | |
| 1641 | @subsubsection S3_2_1_library How to build the library ? |
| 1642 | |
| 1643 | For Linux, the library was successfully built and tested using the following Linaro GCC toolchain: |
| 1644 | |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 1645 | - gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf |
| 1646 | - gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1647 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1648 | To cross-compile the library in debug mode, with Neon only support, for Linux 32bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1649 | |
| 1650 | scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv7a |
| 1651 | |
| 1652 | To cross-compile the library in asserts mode, with OpenCL only support, for Linux 64bit: |
| 1653 | |
| 1654 | scons Werror=1 -j8 debug=0 asserts=1 neon=0 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a |
| 1655 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1656 | You can also compile the library natively on an Arm device by using <b>build=native</b>: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1657 | |
| 1658 | scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=arm64-v8a build=native |
| 1659 | scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a build=native |
| 1660 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1661 | @note g++ for Arm is mono-arch, therefore if you want to compile for Linux 32bit on a Linux 64bit platform you will have to use a cross compiler. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1662 | |
| 1663 | For example on a 64bit Debian based system you would have to install <b>g++-arm-linux-gnueabihf</b> |
| 1664 | |
| 1665 | apt-get install g++-arm-linux-gnueabihf |
| 1666 | |
| 1667 | Then run |
| 1668 | |
| 1669 | scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a build=cross_compile |
| 1670 | |
| 1671 | or simply remove the build parameter as build=cross_compile is the default value: |
| 1672 | |
| 1673 | scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a |
| 1674 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1675 | @subsubsection S3_2_2_examples How to manually build the examples ? |
| 1676 | |
| 1677 | The examples get automatically built by scons as part of the build process of the library described above. This section just describes how you can build and link your own application against our library. |
| 1678 | |
Sheri Zhang | 7a7f4e0 | 2020-08-28 20:08:49 +0100 | [diff] [blame] | 1679 | @note The following command lines assume the arm_compute libraries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built libraries with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1680 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1681 | To cross compile a Neon example for Linux 32bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1682 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1683 | arm-linux-gnueabihf-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute -larm_compute_core -o neon_convolution |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1684 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1685 | To cross compile a Neon example for Linux 64bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1686 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1687 | aarch64-linux-gnu-g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -L. -larm_compute -larm_compute_core -o neon_convolution |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1688 | |
| 1689 | (notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different) |
| 1690 | |
| 1691 | To cross compile an OpenCL example for Linux 32bit: |
| 1692 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1693 | arm-linux-gnueabihf-g++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute -larm_compute_core -o cl_convolution -DARM_COMPUTE_CL |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1694 | |
| 1695 | To cross compile an OpenCL example for Linux 64bit: |
| 1696 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1697 | aarch64-linux-gnu-g++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -L. -larm_compute -larm_compute_core -o cl_convolution -DARM_COMPUTE_CL |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1698 | |
| 1699 | (notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different) |
| 1700 | |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1701 | To cross compile the examples with the Graph API, such as graph_lenet.cpp, you need to link the examples against arm_compute_graph.so too. |
| 1702 | |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1703 | i.e. to cross compile the "graph_lenet" example for Linux 32bit: |
| 1704 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1705 | arm-linux-gnueabihf-g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1706 | |
| 1707 | i.e. to cross compile the "graph_lenet" example for Linux 64bit: |
| 1708 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1709 | aarch64-linux-gnu-g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1710 | |
| 1711 | (notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different) |
| 1712 | |
Anthony Barbier | e500747 | 2017-10-27 15:01:44 +0100 | [diff] [blame] | 1713 | @note If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, arm_compute, arm_compute_core |
| 1714 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1715 | To compile natively (i.e directly on an Arm device) for Neon for Linux 32bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1716 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1717 | g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -mfpu=neon -larm_compute -larm_compute_core -o neon_convolution |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1718 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1719 | To compile natively (i.e directly on an Arm device) for Neon for Linux 64bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1720 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1721 | g++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o neon_convolution |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1722 | |
| 1723 | (notice the only difference with the 32 bit command is that we don't need the -mfpu option) |
| 1724 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1725 | To compile natively (i.e directly on an Arm device) for OpenCL for Linux 32bit or Linux 64bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1726 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1727 | g++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute -larm_compute_core -o cl_convolution -DARM_COMPUTE_CL |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1728 | |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1729 | To compile natively the examples with the Graph API, such as graph_lenet.cpp, you need to link the examples against arm_compute_graph.so too. |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1730 | |
| 1731 | i.e. to natively compile the "graph_lenet" example for Linux 32bit: |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1732 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1733 | g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -mfpu=neon -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1734 | |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1735 | i.e. to natively compile the "graph_lenet" example for Linux 64bit: |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1736 | |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1737 | g++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -L. -larm_compute_graph -larm_compute -larm_compute_core -Wl,--allow-shlib-undefined -o graph_lenet |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1738 | |
| 1739 | (notice the only difference with the 32 bit command is that we don't need the -mfpu option) |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1740 | |
Anthony Barbier | e500747 | 2017-10-27 15:01:44 +0100 | [diff] [blame] | 1741 | @note If compiling using static libraries, this order must be followed when linking: arm_compute_graph_static, arm_compute, arm_compute_core |
| 1742 | |
Gian Marco Iodice | f94c674 | 2020-06-26 12:35:09 +0100 | [diff] [blame] | 1743 | @note These two commands assume libarm_compute.so is available in your library path, if not add the path to it using -L (e.g. -Llib/linux-arm64-v8a-neon-cl-asserts/) |
Georgios Pinitas | 5821632 | 2020-02-26 11:13:13 +0000 | [diff] [blame] | 1744 | @note You might need to export the path to OpenCL library as well in your LD_LIBRARY_PATH if Compute Library was built with OpenCL enabled. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1745 | |
| 1746 | To run the built executable simply run: |
| 1747 | |
| 1748 | LD_LIBRARY_PATH=build ./neon_convolution |
| 1749 | |
| 1750 | or |
| 1751 | |
| 1752 | LD_LIBRARY_PATH=build ./cl_convolution |
| 1753 | |
Georgios Pinitas | 9f28b39 | 2018-07-18 20:01:53 +0100 | [diff] [blame] | 1754 | @note Examples accept different types of arguments, to find out what they are run the example with \a --help as an argument. If no arguments are specified then random values will be used to execute the graph. |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1755 | |
| 1756 | For example: |
Anthony Barbier | 38e7f1f | 2018-05-21 13:37:47 +0100 | [diff] [blame] | 1757 | |
Georgios Pinitas | 9f28b39 | 2018-07-18 20:01:53 +0100 | [diff] [blame] | 1758 | LD_LIBRARY_PATH=. ./graph_lenet --help |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1759 | |
Georgios Pinitas | 9f28b39 | 2018-07-18 20:01:53 +0100 | [diff] [blame] | 1760 | Below is a list of the common parameters among the graph examples : |
| 1761 | @snippet utils/CommonGraphOptions.h Common graph examples parameters |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1762 | |
Manuel Bottini | e5a9ad8 | 2020-11-18 16:22:16 +0000 | [diff] [blame] | 1763 | @subsubsection S3_2_3_sve Build for SVE or SVE2 |
| 1764 | |
| 1765 | In order to build for SVE or SVE2 you need a compiler that supports them. You can find more information in the following these links: |
| 1766 | -# GCC: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/sve-support |
| 1767 | -# LLVM: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/llvm-toolchain/sve-support |
| 1768 | |
| 1769 | @note You the need to indicate the toolchains using the scons "toolchain_prefix" parameter. |
| 1770 | |
| 1771 | An example build command with SVE is: |
| 1772 | |
| 1773 | scons arch=arm64-v8.2-a-sve os=linux build_dir=arm64 -j55 standalone=0 opencl=0 openmp=0 validation_tests=1 neon=1 cppthreads=1 toolchain_prefix=aarch64-none-linux-gnu- |
| 1774 | |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1775 | @subsection S3_3_android Building for Android |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1776 | |
| 1777 | For Android, the library was successfully built and tested using Google's standalone toolchains: |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 1778 | - clang++ from NDK r18b for armv7a |
Giorgio Arena | cd7d178 | 2021-02-22 14:58:37 +0000 | [diff] [blame] | 1779 | - clang++ from NDK r20b for arm64-v8a |
| 1780 | - clang++ from NDK r20b for arm64-v8.2-a with FP16 support |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1781 | |
Giorgio Arena | 793daa1 | 2021-03-01 13:49:58 +0000 | [diff] [blame] | 1782 | For NDK r18 or older, here is a guide to <a href="https://developer.android.com/ndk/guides/standalone_toolchain.html">create your Android standalone toolchains from the NDK</a>: |
Sheri Zhang | 7a7f4e0 | 2020-08-28 20:08:49 +0100 | [diff] [blame] | 1783 | - Download the NDK r18b from here: https://developer.android.com/ndk/downloads/index.html to directory $NDK |
Georgios Pinitas | f112ede | 2019-03-01 19:11:20 +0000 | [diff] [blame] | 1784 | - Make sure you have Python 2.7 installed on your machine. |
Sheri Zhang | 7a7f4e0 | 2020-08-28 20:08:49 +0100 | [diff] [blame] | 1785 | - Generate the 32 and/or 64 toolchains by running the following commands to your toolchain dirctory $MY_TOOLCHAINS: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1786 | |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 1787 | $NDK/build/tools/make_standalone_toolchain.py --arch arm64 --install-dir $MY_TOOLCHAINS/aarch64-linux-android-ndk-r18b --stl libc++ --api 21 |
| 1788 | $NDK/build/tools/make_standalone_toolchain.py --arch arm --install-dir $MY_TOOLCHAINS/arm-linux-android-ndk-r18b --stl libc++ --api 21 |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1789 | |
Giorgio Arena | 793daa1 | 2021-03-01 13:49:58 +0000 | [diff] [blame] | 1790 | For NDK r19 or newer, you can directly <a href="https://developer.android.com/ndk/downloads">Download</a> the NDK package for your development platform, without the need to launch the make_standalone_toolchain.py script. You can find all the prebuilt binaries inside $NDK/toolchains/llvm/prebuilt/$OS_ARCH/bin/. |
| 1791 | @attention the building script will look for a binary named "aarch64-linux-android-clang++", while the prebuilt binaries will have their API version as a suffix to their filename (e.g. "aarch64-linux-android21-clang++"). You should copy/rename the binary removing this suffix, or - alternatively - create an alias for it. |
| 1792 | |
Anthony Barbier | d51ea0a | 2018-08-07 17:48:03 +0100 | [diff] [blame] | 1793 | @attention We used to use gnustl but as of NDK r17 it is deprecated so we switched to libc++ |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1794 | |
Anthony Barbier | 38e7f1f | 2018-05-21 13:37:47 +0100 | [diff] [blame] | 1795 | @note Make sure to add the toolchains to your PATH: |
| 1796 | |
Michele Di Giorgio | 36a551f | 2020-04-23 11:55:29 +0100 | [diff] [blame] | 1797 | export PATH=$PATH:$MY_TOOLCHAINS/aarch64-linux-android-ndk-r18b/bin:$MY_TOOLCHAINS/arm-linux-android-ndk-r18b/bin |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1798 | |
| 1799 | @subsubsection S3_3_1_library How to build the library ? |
| 1800 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1801 | To cross-compile the library in debug mode, with Neon only support, for Android 32bit: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1802 | |
| 1803 | CXX=clang++ CC=clang scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=android arch=armv7a |
| 1804 | |
| 1805 | To cross-compile the library in asserts mode, with OpenCL only support, for Android 64bit: |
| 1806 | |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1807 | CXX=clang++ CC=clang scons Werror=1 -j8 debug=0 asserts=1 neon=0 opencl=1 embed_kernels=1 os=android arch=arm64-v8a |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1808 | |
| 1809 | @subsubsection S3_3_2_examples How to manually build the examples ? |
| 1810 | |
| 1811 | The examples get automatically built by scons as part of the build process of the library described above. This section just describes how you can build and link your own application against our library. |
| 1812 | |
Sheri Zhang | 7a7f4e0 | 2020-08-28 20:08:49 +0100 | [diff] [blame] | 1813 | @note The following command lines assume the arm_compute libraries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built libraries with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed. |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1814 | |
| 1815 | Once you've got your Android standalone toolchain built and added to your path you can do the following: |
| 1816 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1817 | To cross compile a Neon example: |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1818 | |
| 1819 | #32 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1820 | arm-linux-androideabi-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_arm -static-libstdc++ -pie |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1821 | #64 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1822 | aarch64-linux-android-clang++ examples/neon_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o neon_convolution_aarch64 -static-libstdc++ -pie |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1823 | |
| 1824 | To cross compile an OpenCL example: |
| 1825 | |
| 1826 | #32 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1827 | arm-linux-androideabi-clang++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o cl_convolution_arm -static-libstdc++ -pie -DARM_COMPUTE_CL |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1828 | #64 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1829 | aarch64-linux-android-clang++ examples/cl_convolution.cpp utils/Utils.cpp -I. -Iinclude -std=c++14 -larm_compute-static -larm_compute_core-static -L. -o cl_convolution_aarch64 -static-libstdc++ -pie -DARM_COMPUTE_CL |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1830 | |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1831 | To cross compile the examples with the Graph API, such as graph_lenet.cpp, you need to link the library arm_compute_graph also. |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1832 | |
| 1833 | #32 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1834 | arm-linux-androideabi-clang++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -Wl,--whole-archive -larm_compute_graph-static -Wl,--no-whole-archive -larm_compute-static -larm_compute_core-static -L. -o graph_lenet_arm -static-libstdc++ -pie -DARM_COMPUTE_CL |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1835 | #64 bit: |
Georgios Pinitas | 40f51a6 | 2020-11-21 03:04:18 +0000 | [diff] [blame] | 1836 | aarch64-linux-android-clang++ examples/graph_lenet.cpp utils/Utils.cpp utils/GraphUtils.cpp utils/CommonGraphOptions.cpp -I. -Iinclude -std=c++14 -Wl,--whole-archive -larm_compute_graph-static -Wl,--no-whole-archive -larm_compute-static -larm_compute_core-static -L. -o graph_lenet_aarch64 -static-libstdc++ -pie -DARM_COMPUTE_CL |
Gian Marco Iodice | daec1aa | 2017-09-29 12:03:18 +0100 | [diff] [blame] | 1837 | |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1838 | @note Due to some issues in older versions of the Mali OpenCL DDK (<= r13p0), we recommend to link arm_compute statically on Android. |
Anthony Barbier | 20dbb82 | 2017-12-13 21:19:39 +0000 | [diff] [blame] | 1839 | @note When linked statically the arm_compute_graph library currently needs the --whole-archive linker flag in order to work properly |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1840 | |
| 1841 | Then you need to do is upload the executable and the shared library to the device using ADB: |
| 1842 | |
| 1843 | adb push neon_convolution_arm /data/local/tmp/ |
| 1844 | adb push cl_convolution_arm /data/local/tmp/ |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1845 | adb push gc_absdiff_arm /data/local/tmp/ |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1846 | adb shell chmod 777 -R /data/local/tmp/ |
| 1847 | |
| 1848 | And finally to run the example: |
| 1849 | |
| 1850 | adb shell /data/local/tmp/neon_convolution_arm |
| 1851 | adb shell /data/local/tmp/cl_convolution_arm |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1852 | adb shell /data/local/tmp/gc_absdiff_arm |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1853 | |
| 1854 | For 64bit: |
| 1855 | |
| 1856 | adb push neon_convolution_aarch64 /data/local/tmp/ |
| 1857 | adb push cl_convolution_aarch64 /data/local/tmp/ |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1858 | adb push gc_absdiff_aarch64 /data/local/tmp/ |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1859 | adb shell chmod 777 -R /data/local/tmp/ |
| 1860 | |
| 1861 | And finally to run the example: |
| 1862 | |
| 1863 | adb shell /data/local/tmp/neon_convolution_aarch64 |
| 1864 | adb shell /data/local/tmp/cl_convolution_aarch64 |
Anthony Barbier | 14c86a9 | 2017-12-14 16:27:41 +0000 | [diff] [blame] | 1865 | adb shell /data/local/tmp/gc_absdiff_aarch64 |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1866 | |
Georgios Pinitas | 9f28b39 | 2018-07-18 20:01:53 +0100 | [diff] [blame] | 1867 | @note Examples accept different types of arguments, to find out what they are run the example with \a --help as an argument. If no arguments are specified then random values will be used to execute the graph. |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1868 | |
| 1869 | For example: |
Georgios Pinitas | 9f28b39 | 2018-07-18 20:01:53 +0100 | [diff] [blame] | 1870 | adb shell /data/local/tmp/graph_lenet --help |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1871 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1872 | In this case the first argument of LeNet (like all the graph examples) is the target (i.e 0 to run on Neon, 1 to run on OpenCL if available, 2 to run on OpenCL using the CLTuner), the second argument is the path to the folder containing the npy files for the weights and finally the third argument is the number of batches to run. |
Anthony Barbier | 3762e74 | 2018-03-02 11:49:33 +0000 | [diff] [blame] | 1873 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1874 | @subsection S3_4_macos Building for macOS |
| 1875 | |
| 1876 | The library was successfully natively built for Apple Silicon under macOS 11.1 using clang v12.0.0. |
| 1877 | |
| 1878 | To natively compile the library with accelerated CPU support: |
| 1879 | |
| 1880 | scons Werror=1 -j8 neon=1 opencl=0 os=macos arch=arm64-v8a build=native |
| 1881 | |
| 1882 | @note Initial support disables feature discovery through HWCAPS and thread scheduling affinity controls |
| 1883 | |
| 1884 | @subsection S3_5_bare_metal Building for bare metal |
Michalis Spyrou | 6e52ba3 | 2017-10-04 15:40:38 +0100 | [diff] [blame] | 1885 | |
Georgios Pinitas | 5821632 | 2020-02-26 11:13:13 +0000 | [diff] [blame] | 1886 | For bare metal, the library was successfully built using linaro's latest (gcc-linaro-6.3.1-2017.05) bare metal toolchains: |
Michalis Spyrou | 6e52ba3 | 2017-10-04 15:40:38 +0100 | [diff] [blame] | 1887 | - arm-eabi for armv7a |
| 1888 | - aarch64-elf for arm64-v8a |
| 1889 | |
| 1890 | Download linaro for <a href="https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/arm-eabi/">armv7a</a> and <a href="https://releases.linaro.org/components/toolchain/binaries/6.3-2017.05/aarch64-elf/">arm64-v8a</a>. |
| 1891 | |
| 1892 | @note Make sure to add the toolchains to your PATH: export PATH=$PATH:$MY_TOOLCHAINS/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-elf/bin:$MY_TOOLCHAINS/gcc-linaro-6.3.1-2017.05-x86_64_arm-eabi/bin |
| 1893 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1894 | @subsubsection S3_5_1_library How to build the library ? |
Michalis Spyrou | 6e52ba3 | 2017-10-04 15:40:38 +0100 | [diff] [blame] | 1895 | |
Sheri Zhang | ac6499a | 2021-02-10 15:32:38 +0000 | [diff] [blame] | 1896 | To cross-compile the library with Neon support for baremetal arm64-v8a: |
Michalis Spyrou | 6e52ba3 | 2017-10-04 15:40:38 +0100 | [diff] [blame] | 1897 | |
| 1898 | scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=bare_metal arch=arm64-v8a build=cross_compile cppthreads=0 openmp=0 standalone=1 |
| 1899 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1900 | @subsubsection S3_5_2_examples How to manually build the examples ? |
Michalis Spyrou | 6e52ba3 | 2017-10-04 15:40:38 +0100 | [diff] [blame] | 1901 | |
| 1902 | Examples are disabled when building for bare metal. If you want to build the examples you need to provide a custom bootcode depending on the target architecture and link against the compute library. More information about bare metal bootcode can be found <a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0527a/index.html">here</a>. |
| 1903 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1904 | @subsection S3_6_windows_host Building on a Windows host system |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1905 | |
| 1906 | Using `scons` directly from the Windows command line is known to cause |
| 1907 | problems. The reason seems to be that if `scons` is setup for cross-compilation |
| 1908 | it gets confused about Windows style paths (using backslashes). Thus it is |
| 1909 | recommended to follow one of the options outlined below. |
| 1910 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1911 | @subsubsection S3_6_1_ubuntu_on_windows Bash on Ubuntu on Windows |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1912 | |
Gian Marco Iodice | 5fc07aa | 2019-05-15 17:08:02 +0100 | [diff] [blame] | 1913 | The best and easiest option is to use |
| 1914 | <a href="https://msdn.microsoft.com/en-gb/commandline/wsl/about">Ubuntu on Windows</a>. |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1915 | This feature is still marked as *beta* and thus might not be available. |
| 1916 | However, if it is building the library is as simple as opening a *Bash on |
| 1917 | Ubuntu on Windows* shell and following the general guidelines given above. |
| 1918 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1919 | @subsubsection S3_6_2_cygwin Cygwin |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1920 | |
Gian Marco Iodice | 5fc07aa | 2019-05-15 17:08:02 +0100 | [diff] [blame] | 1921 | If the Windows subsystem for Linux is not available <a href="https://www.cygwin.com/">Cygwin</a> |
Pablo Tello | 78a5d22 | 2019-08-06 10:09:18 +0100 | [diff] [blame] | 1922 | can be used to install and run `scons`, the minimum Cygwin version must be 3.0.7 or later. In addition |
| 1923 | to the default packages installed by Cygwin `scons` has to be selected in the installer. (`git` might |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1924 | also be useful but is not strictly required if you already have got the source |
Gian Marco Iodice | 5fc07aa | 2019-05-15 17:08:02 +0100 | [diff] [blame] | 1925 | code of the library.) Linaro provides pre-built versions of |
| 1926 | <a href="http://releases.linaro.org/components/toolchain/binaries/">GCC cross-compilers</a> |
Moritz Pflanzer | 07674de | 2017-07-21 09:39:36 +0100 | [diff] [blame] | 1927 | that can be used from the Cygwin terminal. When building for Android the |
| 1928 | compiler is included in the Android standalone toolchain. After everything has |
| 1929 | been set up in the Cygwin terminal the general guide on building the library |
| 1930 | can be followed. |
| 1931 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1932 | @subsection S3_7_cl_requirements OpenCL DDK Requirements |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 1933 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1934 | @subsubsection S3_7_1_cl_hard_requirements Hard Requirements |
Georgios Pinitas | d9cb057 | 2018-07-16 12:23:09 +0100 | [diff] [blame] | 1935 | |
| 1936 | Compute Library requires OpenCL 1.1 and above with support of non uniform workgroup sizes, which is officially supported in the Mali OpenCL DDK r8p0 and above as an extension (respective extension flag is \a -cl-arm-non-uniform-work-group-size). |
| 1937 | |
| 1938 | Enabling 16-bit floating point calculations require \a cl_khr_fp16 extension to be supported. All Mali GPUs with compute capabilities have native support for half precision floating points. |
| 1939 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1940 | @subsubsection S3_7_2_cl_performance_requirements Performance improvements |
Georgios Pinitas | d9cb057 | 2018-07-16 12:23:09 +0100 | [diff] [blame] | 1941 | |
| 1942 | Integer dot product built-in function extensions (and therefore optimized kernels) are available with Mali OpenCL DDK r22p0 and above for the following GPUs : G71, G76. The relevant extensions are \a cl_arm_integer_dot_product_int8, \a cl_arm_integer_dot_product_accumulate_int8 and \a cl_arm_integer_dot_product_accumulate_int16. |
| 1943 | |
| 1944 | OpenCL kernel level debugging can be simplified with the use of printf, this requires the \a cl_arm_printf extension to be supported. |
| 1945 | |
| 1946 | SVM allocations are supported for all the underlying allocations in Compute Library. To enable this OpenCL 2.0 and above is a requirement. |
Gian Marco Iodice | 201cea1 | 2018-07-30 17:21:41 +0100 | [diff] [blame] | 1947 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1948 | @subsection S3_8_cl_tuner OpenCL Tuner |
Gian Marco Iodice | 201cea1 | 2018-07-30 17:21:41 +0100 | [diff] [blame] | 1949 | |
| 1950 | The OpenCL tuner, a.k.a. CLTuner, is a module of Arm Compute Library that can improve the performance of the OpenCL kernels tuning the Local-Workgroup-Size (LWS). |
| 1951 | The optimal LWS for each unique OpenCL kernel configuration is stored in a table. This table can be either imported or exported from/to a file. |
Vidhya Sudhan Loganathan | dc5d343 | 2019-04-29 11:44:11 +0100 | [diff] [blame] | 1952 | The OpenCL tuner runs the same OpenCL kernel for a range of local workgroup sizes and keeps the local workgroup size of the fastest run to use in subsequent calls to the kernel. It supports three modes of tuning with different trade-offs between the time taken to tune and the kernel execution time achieved using the best LWS found. In the Exhaustive mode, it searches all the supported values of LWS. This mode takes the longest time to tune and is the most likely to find the optimal LWS. Normal mode searches a subset of LWS values to yield a good approximation of the optimal LWS. It takes less time to tune than Exhaustive mode. Rapid mode takes the shortest time to tune and finds an LWS value that is at least as good or better than the default LWS value. The mode affects only the search for the optimal LWS and has no effect when the LWS value is imported from a file. |
Gian Marco Iodice | 201cea1 | 2018-07-30 17:21:41 +0100 | [diff] [blame] | 1953 | In order for the performance numbers to be meaningful you must disable the GPU power management and set it to a fixed frequency for the entire duration of the tuning phase. |
| 1954 | |
| 1955 | If you wish to know more about LWS and the important role on improving the GPU cache utilization, we suggest having a look at the presentation "Even Faster CNNs: Exploring the New Class of Winograd Algorithms available at the following link: |
| 1956 | |
| 1957 | https://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice |
| 1958 | |
| 1959 | Tuning a network from scratch can be long and affect considerably the execution time for the first run of your network. It is recommended for this reason to store the CLTuner's result in a file to amortize this time when you either re-use the same network or the functions with the same configurations. The tuning is performed only once for each OpenCL kernel. |
| 1960 | |
| 1961 | CLTuner looks for the optimal LWS for each unique OpenCL kernel configuration. Since a function (i.e. Convolution Layer, Pooling Layer, Fully Connected Layer ...) can be called multiple times but with different parameters, we associate an "id" (called "config_id") to each kernel to distinguish the unique configurations. |
| 1962 | |
| 1963 | #Example: 2 unique Matrix Multiply configurations |
| 1964 | @code{.cpp} |
| 1965 | TensorShape a0 = TensorShape(32,32); |
| 1966 | TensorShape b0 = TensorShape(32,32); |
| 1967 | TensorShape c0 = TensorShape(32,32); |
| 1968 | TensorShape a1 = TensorShape(64,64); |
| 1969 | TensorShape b1 = TensorShape(64,64); |
| 1970 | TensorShape c1 = TensorShape(64,64); |
| 1971 | |
| 1972 | Tensor a0_tensor; |
| 1973 | Tensor b0_tensor; |
| 1974 | Tensor c0_tensor; |
| 1975 | Tensor a1_tensor; |
| 1976 | Tensor b1_tensor; |
| 1977 | Tensor c1_tensor; |
| 1978 | |
| 1979 | a0_tensor.allocator()->init(TensorInfo(a0, 1, DataType::F32)); |
| 1980 | b0_tensor.allocator()->init(TensorInfo(b0, 1, DataType::F32)); |
| 1981 | c0_tensor.allocator()->init(TensorInfo(c0, 1, DataType::F32)); |
| 1982 | a1_tensor.allocator()->init(TensorInfo(a1, 1, DataType::F32)); |
| 1983 | b1_tensor.allocator()->init(TensorInfo(b1, 1, DataType::F32)); |
| 1984 | c1_tensor.allocator()->init(TensorInfo(c1 1, DataType::F32)); |
| 1985 | |
| 1986 | CLGEMM gemm0; |
| 1987 | CLGEMM gemm1; |
| 1988 | |
| 1989 | // Configuration 0 |
| 1990 | gemm0.configure(&a0, &b0, nullptr, &c0, 1.0f, 0.0f); |
| 1991 | |
| 1992 | // Configuration 1 |
| 1993 | gemm1.configure(&a1, &b1, nullptr, &c1, 1.0f, 0.0f); |
| 1994 | @endcode |
| 1995 | |
Georgios Pinitas | 4551403 | 2020-12-30 00:03:09 +0000 | [diff] [blame] | 1996 | @subsubsection S3_8_1_cl_tuner_how_to How to use it |
Gian Marco Iodice | 201cea1 | 2018-07-30 17:21:41 +0100 | [diff] [blame] | 1997 | |
Michele Di Giorgio | 57f30a9 | 2020-09-08 14:03:51 +0100 | [diff] [blame] | 1998 | All the graph examples in the Compute Library's folder "examples" and the arm_compute_benchmark accept an argument to enable the OpenCL tuner and an argument to export/import the LWS values to/from a file |
Gian Marco Iodice | 201cea1 | 2018-07-30 17:21:41 +0100 | [diff] [blame] | 1999 | |
| 2000 | #Enable CL tuner |
| 2001 | ./graph_mobilenet --enable-tuner –-target=CL |
| 2002 | ./arm_compute_benchmark --enable-tuner |
| 2003 | |
| 2004 | #Export/Import to/from a file |
| 2005 | ./graph_mobilenet --enable-tuner --target=CL --tuner-file=acl_tuner.csv |
| 2006 | ./arm_compute_benchmark --enable-tuner --tuner-file=acl_tuner.csv |
| 2007 | |
| 2008 | If you are importing the CLTuner'results from a file, the new tuned LWS values will be appended to it. |
| 2009 | |
| 2010 | Either you are benchmarking the graph examples or the test cases in the arm_compute_benchmark remember to: |
| 2011 | |
| 2012 | -# Disable the power management |
| 2013 | -# Keep the GPU frequency constant |
| 2014 | -# Run multiple times the network (i.e. 10). |
| 2015 | |
| 2016 | If you are not using the graph API or the benchmark infrastructure you will need to manually pass a CLTuner object to CLScheduler before configuring any function. |
| 2017 | |
| 2018 | @code{.cpp} |
| 2019 | CLTuner tuner; |
| 2020 | |
| 2021 | // Setup Scheduler |
| 2022 | CLScheduler::get().default_init(&tuner); |
| 2023 | @endcode |
| 2024 | |
| 2025 | After the first run, the CLTuner's results can be exported to a file using the method "save_to_file()". |
| 2026 | - tuner.save_to_file("results.csv"); |
| 2027 | |
| 2028 | This file can be also imported using the method "load_from_file("results.csv")". |
| 2029 | - tuner.load_from_file("results.csv"); |
Anthony Barbier | 6ff3b19 | 2017-09-04 18:44:23 +0100 | [diff] [blame] | 2030 | */ |
Anthony Barbier | d51ea0a | 2018-08-07 17:48:03 +0100 | [diff] [blame] | 2031 | } // namespace arm_compute |