| /// |
| /// Copyright (c) 2017-2023 Arm Limited. |
| /// |
| /// SPDX-License-Identifier: MIT |
| /// |
| /// Permission is hereby granted, free of charge, to any person obtaining a copy |
| /// of this software and associated documentation files (the "Software"), to |
| /// deal in the Software without restriction, including without limitation the |
| /// rights to use, copy, modify, merge, publish, distribute, sublicense, and/or |
| /// sell copies of the Software, and to permit persons to whom the Software is |
| /// furnished to do so, subject to the following conditions: |
| /// |
| /// The above copyright notice and this permission notice shall be included in all |
| /// copies or substantial portions of the Software. |
| /// |
| /// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| /// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| /// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| /// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| /// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| /// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| /// SOFTWARE. |
| /// |
| namespace arm_compute |
| { |
| /** @page versions_changelogs Release Versions and Changelog |
| |
| @tableofcontents |
| |
| @section S2_1_versions Release versions |
| |
| All releases are numbered vYY.MM Where YY are the last two digits of the year, and MM the month number. |
| If there is more than one release in a month then an extra sequential number is appended at the end: |
| |
| v17.03 (First release of March 2017) |
| v17.03.1 (Second release of March 2017) |
| v17.04 (First release of April 2017) |
| |
| @note We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes. |
| @note Starting from release 22.05, 'master' branch is no longer being used, it has been replaced by 'main'. Please update your clone jobs accordingly. |
| |
| @section S2_2_changelog Changelog |
| |
| v23.11 Public major release |
| - New features |
| - Add support for input data type U64/S64 in CLCast and NECast. |
| - Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer |
| - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface: |
| - @ref experimental::dynamic_fusion::GpuCkwResize |
| - Update OpenCL™ API headers to v2023.04.17. |
| - Performance optimizations: |
| - Optimize @ref cpu::CpuReshape |
| - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only: |
| - @ref experimental::dynamic_fusion::GpuCkwPool2d |
| |
| v23.08 Public major release |
| - Deprecate the legacy 'libarm_compute_core' library. This library is an artifact of Compute Library's legacy library architecture and no longer serves any purpose. |
| Users must no longer link their applications to this library and instead link only to the main `libarm_compute` library for core functionality. |
| - New features |
| - Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output. |
| - Add multi-sketch support for dynamic fusion. |
| - Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers. |
| - Add Fused Activation to CLMatMul. |
| - Implement FP32/FP16 @ref opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension. |
| - Use MatMul in fully connected layer with dynamic weights when supported. |
| - Optimize CPU depthwise convolution with channel multiplier. |
| - Add support in CpuCastKernel for conversion of S64/U64 to F32. |
| - Add new OpenCL™ kernels: |
| - @ref opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support |
| - Enable transposed convolution with non-square kernels on CPU and GPU. |
| - Add support for input data type U64/S64 in CLCast. |
| - Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion. |
| - Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only: |
| - @ref experimental::dynamic_fusion::GpuCkwActivation |
| - @ref experimental::dynamic_fusion::GpuCkwCast |
| - @ref experimental::dynamic_fusion::GpuCkwDirectConv2d |
| - @ref experimental::dynamic_fusion::GpuCkwElementwiseBinary |
| - @ref experimental::dynamic_fusion::GpuCkwStore |
| - Various optimizations and bug fixes. |
| |
| v23.05.1 Public patch release |
| - Enable CMake and Bazel option to build multi_isa without FP16 support. |
| - Fix compilation error in NEReorderLayer (aarch64 only). |
| - Disable invalid (false-negative) validation test with CPU scale layer on FP16. |
| - Various bug fixes |
| |
| v23.05 Public major release |
| - New features: |
| - Add new Arm® Neon™ kernels / functions: |
| - @ref NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support. |
| - NEReorderLayer (aarch64 only) |
| - Add new OpenCL™ kernels / functions: |
| - @ref CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support. |
| - Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer. |
| - Add support for dynamic weights in @ref CLFullyConnectedLayer and @ref NEFullyConnectedLayer for all data types. |
| - Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types. |
| - Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™. |
| - Implement RSQRT for quantized data types on OpenCL™. |
| - Add FP16 depthwise convolution kernels for SME2. |
| - Performance optimizations: |
| - Improve CLTuner exhaustive mode tuning time. |
| - Deprecate dynamic block shape in @ref NEBatchToSpaceLayer and @ref CLBatchToSpaceLayer. |
| - Various optimizations and bug fixes. |
| |
| v23.02.1 Public patch release |
| - Allow mismatching data layouts between the source tensor and weights for \link cpu::CpuGemmDirectConv2d CpuGemmDirectConv2d \endlink with fixed format kernels. |
| - Fixes for experimental CPU only Bazel and CMake builds. |
| |
| v23.02 Public major release |
| - New features: |
| - Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator. |
| - Add the following operators to the experimental dynamic fusion API: |
| - GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub. |
| - Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling. |
| - Add new CPU operator AddMulAdd for float and quantized types. |
| - Add new flag @ref ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings. |
| - Add experimental support for CPU only Bazel and CMake builds. |
| - Performance optimizations: |
| - Optimize CPU base-e exponential functions for FP32. |
| - Optimize CPU StridedSlice by copying first dimension elements in bulk where possible. |
| - Optimize CPU quantized Subtraction by reusing the quantized Addition kernel. |
| - Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain. |
| - Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain. |
| - Update the heuristic for CLDepthwiseConvolutionNative kernel. |
| - Add new optimized OpenCL kernel to compute indirect convolution: |
| - \link opencl::kernels::ClIndirectConv2dKernel ClIndirectConv2dKernel \endlink |
| - Add new optimized OpenCL kernel to compute transposed convolution: |
| - \link opencl::kernels::ClTransposedConvolutionKernel ClTransposedConvolutionKernel \endlink |
| - Update recommended/minimum NDK version to r20b. |
| - Various optimizations and bug fixes. |
| |
| v22.11 Public major release |
| - New features: |
| - Add new experimental dynamic fusion API. |
| - Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32. |
| - Add CPU MeanStdDevNorm for QASYMM8. |
| - Add CPU and GPU GELU activation function for FP32 and FP16. |
| - Add CPU swish activation function for FP32 and FP16. |
| - Performance optimizations: |
| - Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8. |
| - Optimize CPU activation functions using LUT-based implementation: |
| - Sigmoid function for QASYMM8 and QASYMM8_SIGNED. |
| - Hard swish function for QASYMM8_SIGNED. |
| - Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic. |
| - Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D. |
| - Optimize GPU depthwise convolution kernel and heuristic. |
| - Optimize GPU Conv2d heuristic. |
| - Optimize CPU MeanStdDevNorm for FP16. |
| - Optimize CPU tanh activation function for FP16 using rational approximation. |
| - Improve GPU GeMMLowp start-up time. |
| - Various optimizations and bug fixes. |
| |
| v22.08 Public major release |
| - Various bug fixes. |
| - Disable unsafe FP optimizations causing accuracy issues in: |
| - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink |
| - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv3dKernel \endlink |
| - @ref CLDepthwiseConvolutionLayerNativeKernel |
| - Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add. |
| - Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615. |
| - Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel. |
| - Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310. |
| - Extend the direct convolution 2d interface to configure the block size. |
| - Update ClConv2D heuristic to use direct convolution. |
| - Use official Khronos® OpenCL extensions: |
| - Add cl_khr_integer_dot_product extension support. |
| - Add support of OpenCL 3.0 non-uniform workgroup. |
| - Cpu performance optimizations: |
| - Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build. |
| - Optimize Add layer by considering the input tensors as 1D array. |
| - Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights. |
| - Add new winograd convolution kernels implementation and update the ACL \link arm_compute::cpu::CpuWinogradConv2d CpuWinogradConv2d\endlink operator. |
| - Add experimental support for native builds for Windows® on Arm™. |
| - Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions. |
| - Build flag change: toolchain_prefix, compiler_prefix: |
| - Use empty string "" to suppress any prefixes. |
| - Use "auto" to use default (auto) prefixes chosen by the build script. This is the default behavior when unspecified. |
| - Any other string will be used as custom prefixes to the compiler and the rest of toolchain tools. |
| - The default behaviour when prefix is unspecified does not change, but its signifier has been changed from empty string "" to "auto". |
| - armv7a with Android build will no longer be tested or maintained. |
| |
| v22.05 Public major release |
| - Various bug fixes. |
| - Various optimizations. |
| - Add support for NDK r23b. |
| - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. |
| - New Arm® Neon™ kernels / functions : |
| - \link opencl::kernels::ClPool3dKernel ClPool3dKernel \endlink |
| - New OpenCL kernels / functions : |
| - \link cpu::kernels::CpuPool3dKernel CpuPool3dKernel \endlink |
| - Improve the start-up times for the following OpenCL kernels: |
| - \link opencl::kernels::ClWinogradInputTransformKernel ClWinogradInputTransformKernel \endlink |
| - \link opencl::kernels::ClWinogradOutputTransformKernel ClWinogradOutputTransformKernel \endlink |
| - \link opencl::kernels::ClWinogradFilterTransformKernel ClWinogradFilterTransformKernel \endlink |
| - \link opencl::kernels::ClHeightConcatenateKernel ClHeightConcatenateKernel \endlink |
| - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): |
| - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink |
| - \link cpu::kernels::CpuDepthwiseConv2dNativeKernel CpuDepthwiseConv2dNativeKernel \endlink |
| - \link cpu::kernels::CpuGemmMatrixAdditionKernel CpuGemmMatrixAdditionKernel \endlink |
| - \link cpu::kernels::CpuGemmMatrixMultiplyKernel CpuGemmMatrixMultiplyKernel \endlink |
| - @ref NEFuseBatchNormalizationKernel |
| - @ref NEL2NormalizeLayerKernel |
| |
| v22.02 Public major release |
| - Various bug fixes. |
| - Various optimizations. |
| - Update A510 arm_gemm cpu Kernels. |
| - Inclusive language adjustment. Please refer to @ref S5_0_inc_lang for details. |
| - Improve the start-up time for the following OpenCL kernels: |
| - @ref CLScale |
| - @ref CLGEMM |
| - @ref CLDepthwiseConvolutionLayer |
| - \link opencl::kernels::ClIm2ColKernel ClIm2ColKernel \endlink |
| - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink |
| - Remove functions: |
| - CLRemap |
| - NERemap |
| - Remove padding from OpenCL kernels: |
| - \link opencl::kernels::ClDirectConv2dKernel ClDirectConv2dKernel \endlink |
| - Remove padding from Cpu kernels: |
| - \link cpu::kernels::CpuDirectConv2dKernel CpuDirectConv2dKernel \endlink |
| - Decouple the implementation of the following Cpu kernels into various data types (fp32, fp16, int): |
| - \link cpu::kernels::CpuActivationKernel CpuActivationKernel \endlink |
| - \link cpu::kernels::CpuAddKernel CpuAddKernel \endlink |
| - \link cpu::kernels::CpuElementwiseKernel CpuElementwiseKernel \endlink |
| - \link cpu::CpuSoftmaxGeneric CpuSoftmaxKernel \endlink |
| - @ref NEBoundingBoxTransformKernel |
| - @ref NECropKernel |
| - @ref NEComputeAllAnchorsKernel |
| - @ref NEInstanceNormalizationLayerKernel |
| - NEMaxUnpoolingLayerKernel |
| - @ref NEMeanStdDevNormalizationKernel |
| - @ref NERangeKernel |
| - @ref NEROIAlignLayerKernel |
| - @ref NESelectKernel |
| |
| v21.11 Public major release |
| - Various bug fixes. |
| - Various optimizations: |
| - Improve performance of bilinear and nearest neighbor Scale on both CPU and GPU for FP32, FP16, Int8, Uint8 data types |
| - Improve performance of Softmax on GPU for Uint8/Int8 |
| - New OpenCL kernels / functions: |
| - @ref CLConv3D |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEConv3D |
| - Support configurable build by a selected subset of operator list |
| - Support MobileBert on Neon™ backend |
| - Improve operator/function logging |
| - Remove padding from OpenCL kernels: |
| - ClPool2dKernel |
| - ClScaleKernel |
| - ClGemmMatrixMultiplyReshapedKernel |
| - Remove padding from Cpu kernels: |
| - CpuPool2dKernel |
| - Remove Y padding from OpenCL kernels: |
| - ClGemmMatrixMultiplyKernel |
| - ClGemmReshapedRHSMatrixKernel |
| - Remove legacy GeMM kernels in gemm_v1.cl |
| |
| v21.08 Public major release |
| - Various bug fixes. |
| - Various optimizations: |
| - Improve LWS (Local-Workgroup-Size) heuristic in OpenCL for GeMM, Direct Convolution and Winograd Transformations when OpenCL tuner is not used |
| - Improve QASYMM8/QSYMM8 performance on OpenCL for various Arm® Mali™ GPU architectures |
| - Add dynamic weights support in Fully connected layer (CPU/GPU) |
| - Various performance optimizations for floating-point data types (CPU/GPU) |
| - Add a reduced core library build arm_compute_core_v2 |
| - Expose Operator API |
| - Support fat binary build for arm8.2-a via fat_binary build flag |
| - Add CPU discovery capabilities |
| - Add data type f16 support for: |
| - CLRemapKernel |
| - Port the following functions to stateless API: |
| - @ref CLConvolutionLayer |
| - @ref CLFlattenLayer |
| - @ref CLFullyConnectedLayer |
| - @ref CLGEMM |
| - @ref CLGEMMConvolutionLayer |
| - @ref CLGEMMLowpMatrixMultiplyCore |
| - @ref CLWinogradConvolutionLayer |
| - @ref NEConvolutionLayer |
| - @ref NEFlattenLayer |
| - @ref NEFullyConnectedLayer |
| - @ref NEGEMM |
| - @ref NEGEMMConv2d |
| - @ref NEGEMMConvolutionLayer |
| - @ref NEGEMMLowpMatrixMultiplyCore |
| - @ref NEWinogradConvolutionLayer |
| - Remove the following functions: |
| - CLWinogradInputTransform |
| - Remove CLCoreRuntimeContext |
| - Remove ICPPSimpleKernel |
| - Rename file arm_compute/runtime/CL/functions/CLElementWiseUnaryLayer.h to arm_compute/runtime/CL/functions/CLElementwiseUnaryLayer.h |
| |
| v21.05 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Various documentation updates: |
| - Add supported operators and corresponding Android NNAPI operators. |
| - Documentation reorg into user guide and contributor guide. |
| - Add support for a global allocator for OpenCL tensors |
| - Add experimental support for [CLVK](https://github.com/kpet/clvk). |
| - Add data type S32 support for: |
| - @ref opencl::kernels::ClArithmeticKernel |
| - Add data type QASYMM8 support for: |
| - @ref CLROIPoolingLayer |
| - @ref CLROIPoolingLayerKernel |
| - @ref NEROIPoolingLayer |
| - @ref NEROIPoolingLayerKernel |
| - Add per-channel quantization support for: |
| - @ref CLDeconvolutionLayer |
| - @ref CLDirectDeconvolutionLayer |
| - @ref NEConvolutionLayer |
| - @ref NEDeconvolutionLayer |
| - Remove padding from OpenCL kernels: |
| - @ref CLL2NormalizeLayerKernel |
| - CLDepthwiseConvolutionLayer3x3NHWCKernel |
| - @ref CLNormalizationLayerKernel |
| - @ref CLNormalizePlanarYUVLayerKernel |
| - @ref opencl::kernels::ClMulKernel |
| - @ref CLReductionOperationKernel |
| - @ref CLROIPoolingLayerKernel |
| - Remove computer vision support from Arm® Neon™ backend |
| - Remove the following functions: |
| - NEAbsoluteDifference |
| - NEAccumulate |
| - NEBox3x3 |
| - NECannyEdge |
| - NEChannelCombine |
| - NEChannelExtract |
| - NEColorConvert |
| - NEConvolution |
| - NEDerivative |
| - NEDilate |
| - NEEqualizeHistogram |
| - NEErode |
| - NEFastCorners |
| - NEGaussian3x3 |
| - NEGaussian5x5 |
| - NEGaussianPyramid |
| - NEHOGDescriptor |
| - NEHOGDetector |
| - NEHOGGradient |
| - NEHOGMultiDetection |
| - NEHarrisCorners |
| - NEHistogram |
| - NEIntegralImage |
| - NELaplacianPyramid |
| - NELaplacianReconstruct |
| - NEMagnitude |
| - NEMeanStdDev |
| - NEMedian3x3 |
| - NEMinMaxLocation |
| - NENonLinearFilter |
| - NEOpticalFlow |
| - NEPhase |
| - NEScharr3x3 |
| - NESobel3x3 |
| - NESobel5x5 |
| - NESobel7x7 |
| - NETableLookup |
| - NEThreshold |
| - NEWarpAffine |
| - NEWarpPerspectiveKernel |
| - Remove all GLES kernels / functions / tests / examples |
| - Remove computer vision support from CL backend |
| - Remove the following functions: |
| - CLAbsoluteDifference |
| - CLAccumulate |
| - CLBox3x3 |
| - CLCannyEdge |
| - CLChannelCombine |
| - CLChannelExtract |
| - CLColorConvert |
| - CLConvolution |
| - CLDerivative |
| - CLDilate |
| - CLEqualizeHistogram |
| - CLErode |
| - CLFastCorners |
| - CLGaussian3x3 |
| - CLGaussian5x5 |
| - CLGaussianPyramid |
| - CLHOGDescriptor |
| - CLHOGDetector |
| - CLHOGGradient |
| - CLHOGMultiDetection |
| - CLHarrisCorners |
| - CLHistogram |
| - CLIntegralImage |
| - CLLaplacianPyramid |
| - CLLaplacianReconstruct |
| - CLMagnitude |
| - CLMeanStdDev |
| - CLMedian3x3 |
| - CLMinMaxLocation |
| - CLNonLinearFilter |
| - CLOpticalFlow |
| - CLPhase |
| - CLScharr3x3 |
| - CLSobel3x3 |
| - CLSobel5x5 |
| - CLSobel7x7 |
| - CLTableLookup |
| - CLThreshold |
| - CLWarpAffine |
| - CLWarpPerspective |
| |
| v21.02 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Upgrade C++ standard to C++14 |
| - Add macOS support |
| - Add Armv8-R AArch64 architecture support |
| - Add SVE/SVE2 support for: |
| - NEScaleKernel |
| - @ref NEActivationLayer |
| - @ref NEArithmeticAddition |
| - @ref NEBatchNormalizationLayerKernel |
| - @ref cpu::kernels::CpuLogits1DSoftmaxKernel |
| - @ref cpu::kernels::CpuLogits1DMaxKernel |
| - @ref cpu::kernels::CpuElementwiseUnaryKernel |
| - Remove padding from OpenCL kernels: |
| - CLDirectConvolutionLayerKernel |
| - @ref CLArgMinMaxLayerKernel |
| - @ref CLPadLayerKernel |
| - @ref CLROIAlignLayerKernel |
| - @ref CLRangeKernel |
| - CLScaleKernel |
| - @ref CLSelectKernel |
| - @ref CLBitwiseKernel |
| - @ref opencl::kernels::ClFloorKernel |
| - CLTransposeKernel |
| - Deprecate functions in CLTuner: |
| - add_lws_to_table |
| - import_lws_table |
| - lws_table |
| - Remove functions: |
| - NELocallyConnectedLayer / CLLocallyConnectedLayer |
| - NEIm2Col |
| - NECol2Im |
| - NEGEMMInterleave4x4 |
| - NEGEMMTranspose1xW |
| - NEComputeAllAnchors / CLComputeAllAnchors |
| - NEGEMMAssemblyDispatch |
| - NEUpsampleLayer / CLUpsampleLayer |
| - Remove kernels: |
| - NEGEMMMatrixVectorMultiplyKernel |
| - NELocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedMatrixMultiplyKernel |
| - NEUpsampleLayerKernel / CLUpsampleLayerKernel |
| - Extend OpenCL tuner with workgroup batch size support |
| - Experimental extension for the OpenCL tuner to tune the batches of work groups distribute to compute units |
| - Add functionality to load the OpenCL GEMM heuristics at runtime |
| - The GEMM heuristic file (MLGO) can be used to update the default GEMM heuristics available for OpenCL |
| - Note: there might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation |
| - Note: data-type decoupling is in progress and experimental. Warning of unused symbols might be raised |
| |
| v20.11 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Performance regressions can be noted when executing Depthwise Convolution on Arm® Neon™ with a depth multiplier > 1 for quantized data type. |
| This is planned to be resolved in 21.02 release. |
| - Added new data type QASYMM8_SIGNED support for @ref NEROIAlignLayer. |
| - Added new data type S32 support for: |
| - NEArithmeticSubtraction |
| - NEArithmeticSubtractionKernel |
| - @ref NEPixelWiseMultiplication |
| - NEPixelWiseMultiplicationKernel |
| - NEElementwiseDivision |
| - NEDivisionOperationKernel |
| - Interface change |
| - Properly support softmax axis to have the same meaning as other major frameworks. That is, axis now defines the dimension |
| on which Softmax/Logsoftmax is performed. E.g. for input of shape 4x5x6 and axis=1, softmax will be applied to 4x6=24 vectors of size 5. |
| The supported value range of axis is [-rank, rank). |
| This change applies to the following functions: |
| - @ref NESoftmaxLayer |
| - @ref NELogSoftmaxLayer |
| - @ref CLSoftmaxLayer |
| - @ref CLLogSoftmaxLayer |
| - GCSoftmaxLayer |
| - New OpenCL kernels / functions: |
| - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel |
| - @ref CLLogicalNot |
| - @ref CLLogicalAnd |
| - @ref CLLogicalOr |
| - New Arm® Neon™ kernels / functions: |
| - @ref NELogicalNot |
| - @ref NELogicalAnd |
| - @ref NELogicalOr |
| - Removed padding from Arm® Neon™ kernels: |
| - NEComplexPixelWiseMultiplicationKernel |
| - NENonMaximaSuppression3x3Kernel |
| - NERemapKernel |
| - NEGEMMInterleave4x4Kernel |
| - NEDirectConvolutionLayerKernel |
| - NEScaleKernel |
| - NELocallyConnectedMatrixMultiplyKernel |
| - NEGEMMLowpOffsetContributionKernel |
| - NEGEMMTranspose1xWKernel |
| - NEPoolingLayerKernel |
| - NEConvolutionKernel |
| - NEDepthwiseConvolutionLayerNativeKernel |
| - NEGEMMLowpMatrixMultiplyKernel |
| - NEGEMMMatrixMultiplyKernel |
| - NEDirectConvolutionLayerOutputStageKernel |
| - @ref NEReductionOperationKernel |
| - NEGEMMLowpMatrixAReductionKernel |
| - NEGEMMLowpMatrixBReductionKernel |
| - Removed padding from OpenCL kernels: |
| - CLBatchConcatenateLayerKernel |
| - CLElementwiseOperationKernel |
| - @ref CLBatchNormalizationLayerKernel |
| - CLPoolingLayerKernel |
| - CLWinogradInputTransformKernel |
| - CLGEMMLowpMatrixMultiplyNativeKernel |
| - CLGEMMLowpMatrixAReductionKernel |
| - CLGEMMLowpMatrixBReductionKernel |
| - CLGEMMLowpOffsetContributionOutputStageKernel |
| - CLGEMMLowpOffsetContributionKernel |
| - CLWinogradOutputTransformKernel |
| - CLGEMMLowpMatrixMultiplyReshapedKernel |
| - @ref CLFuseBatchNormalizationKernel |
| - @ref CLDepthwiseConvolutionLayerNativeKernel |
| - CLDepthConvertLayerKernel |
| - CLCopyKernel |
| - CLDepthwiseConvolutionLayer3x3NHWCKernel |
| - CLActivationLayerKernel |
| - CLWinogradFilterTransformKernel |
| - CLWidthConcatenateLayerKernel |
| - CLWidthConcatenate4TensorsKernel |
| - CLWidthConcatenate2TensorsKernel |
| - CLLogits1DMaxShiftExpSumKernel |
| - CLLogits1DNormKernel |
| - CLHeightConcatenateLayerKernel |
| - CLGEMMMatrixMultiplyKernel |
| - CLGEMMLowpQuantizeDownInt32ScaleKernel |
| - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel |
| - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
| - CLDepthConcatenateLayerKernel |
| - CLGEMMLowpQuantizeDownInt32ScaleByFixedPointKernel |
| - Removed OpenCL kernels / functions: |
| - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
| - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel |
| - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel |
| - Deprecated OpenCL kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
| - CLLocallyConnectedLayer |
| - CLLocallyConnectedMatrixMultiplyKernel |
| - CLAbsoluteDifference |
| - CLAbsoluteDifferenceKernel |
| - CLAccumulate |
| - CLAccumulateKernel |
| - CLAccumulateSquared |
| - CLAccumulateSquaredKernel |
| - CLAccumulateWeighted |
| - CLAccumulateWeightedKernel |
| - CLAccumulateWeightedFP16Kernel |
| - CLBox3x3 |
| - CLBox3x3Kernel |
| - CLBox3x3FP16Kernel |
| - CLCannyEdge |
| - CLChannelCombine |
| - CLChannelCombineKernel |
| - CLChannelExtract |
| - CLChannelExtractKernel |
| - CLColorConvert |
| - CLColorConvertKernel |
| - CLConvolution3x3 |
| - CLConvolutionRectangle |
| - CLConvolutionRectangleKernel |
| - CLConvolutionSquare |
| - CLConvolutionKernel |
| - CLDerivative |
| - CLDerivativeKernel |
| - CLDilate |
| - CLDilateKernel |
| - CLEqualizeHistogram |
| - CLErode |
| - CLErodeKernel |
| - CLFastCorners |
| - CLFastCornersKernel |
| - CLGaussian3x3 |
| - CLGaussian3x3Kernel |
| - CLGaussian5x5 |
| - CLGaussian5x5HorKernel |
| - CLGaussian5x5VertKernel |
| - CLGaussianPyramid |
| - CLGaussianPyramidHalf |
| - CLGaussianPyramidOrb |
| - CLHarrisCorners |
| - CLHarrisScoreKernel |
| - CLHarrisScoreFP16Kernel |
| - CLHistogram |
| - CLHistogramKernel |
| - CLHOGOrientationBinningKernel |
| - CLHOGBlockNormalizationKernel |
| - CLHOGDetectorKernel |
| - CLHOGNonMaximaSuppressionKernel |
| - CLHOGDescriptor |
| - CLHOGDetector |
| - CLHOGGradient |
| - CLHOGMultiDetection |
| - CLHOGOrientationBinningKernel |
| - CLHOGBlockNormalizationKernel |
| - CLHOGDetectorKernel |
| - CLIntegralImage |
| - CLIntegralImageKernel |
| - CLLaplacianReconstruct |
| - CLLaplacianPyramid |
| - CLMagnitude |
| - CLMagnitudePhaseKernel |
| - CLMedian3x3 |
| - CLMedian3x3Kernel |
| - CLMinMaxLocation |
| - CLMinMaxLocationKernel |
| - CLNonLinearFilter |
| - CLNonLinearFilterKernel |
| - CLNonMaximaSuppression3x3 |
| - CLNonMaximaSuppression3x3FP16Kernel |
| - CLNonMaximaSuppression3x3Kernel |
| - CLOpticalFlow |
| - CLPhase |
| - CLRemap |
| - CLRemapKernel |
| - CLScharr3x3 |
| - CLScharr3x3Kernel |
| - CLSobel3x3 |
| - CLSobel3x3Kernel |
| - CLSobel5x5 |
| - CLSobel5x5HorKernel |
| - CLSobel5x5VertKernel |
| - CLSobel7x7 |
| - CLSobel7x7HorKernel |
| - CLSobel7x7VertKernel |
| - CLThreshold |
| - CLThresholdKernel |
| - CLWarpAffine |
| - CLWarpAffineKernel |
| - CLWarpPerspective |
| - CLWarpPerspectiveKernel |
| - Deprecated Arm® Neon™ kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
| - NELocallyConnectedLayer |
| - NELocallyConnectedMatrixMultiplyKernel |
| - NEAbsoluteDifference |
| - NEAbsoluteDifferenceKernel |
| - NEAccumulate |
| - NEAccumulateKernel |
| - NEAccumulateSquared |
| - NEAccumulateSquaredKernel |
| - NEAccumulateWeighted |
| - NEAccumulateWeightedKernel |
| - NEAccumulateWeightedFP16Kernel |
| - NEBox3x3 |
| - NEBox3x3Kernel |
| - NEBox3x3FP16Kernel |
| - NECannyEdge |
| - NEChannelCombine |
| - NEChannelCombineKernel |
| - NEChannelExtract |
| - NEChannelExtractKernel |
| - NEColorConvert |
| - NEColorConvertKernel |
| - NEConvolution3x3 |
| - NEConvolutionRectangle |
| - NEConvolutionRectangleKernel |
| - NEConvolutionSquare |
| - NEConvolutionKernel |
| - NEDerivative |
| - NEDerivativeKernel |
| - NEDilate |
| - NEDilateKernel |
| - NEEqualizeHistogram |
| - NEErode |
| - NEErodeKernel |
| - NEFastCorners |
| - NEFastCornersKernel |
| - NEGaussian3x3 |
| - NEGaussian3x3Kernel |
| - NEGaussian5x5 |
| - NEGaussian5x5HorKernel |
| - NEGaussian5x5VertKernel |
| - NEGaussianPyramid |
| - NEGaussianPyramidHalf |
| - NEGaussianPyramidOrb |
| - NEHarrisCorners |
| - NEHarrisScoreKernel |
| - NEHarrisScoreFP16Kernel |
| - NEHistogram |
| - NEHistogramKernel |
| - NEHOGOrientationBinningKernel |
| - NEHOGBlockNormalizationKernel |
| - NEHOGDetectorKernel |
| - NEHOGNonMaximaSuppressionKernel |
| - NEHOGDescriptor |
| - NEHOGDetector |
| - NEHOGGradient |
| - NEHOGMultiDetection |
| - NEHOGOrientationBinningKernel |
| - NEHOGBlockNormalizationKernel |
| - NEHOGDetectorKernel |
| - NEIntegralImage |
| - NEIntegralImageKernel |
| - NELaplacianReconstruct |
| - NELaplacianPyramid |
| - NEMagnitude |
| - NEMagnitudePhaseKernel |
| - NEMedian3x3 |
| - NEMedian3x3Kernel |
| - NEMinMaxLocation |
| - NEMinMaxLocationKernel |
| - NENonLinearFilter |
| - NENonLinearFilterKernel |
| - NENonMaximaSuppression3x3 |
| - NENonMaximaSuppression3x3FP16Kernel |
| - NENonMaximaSuppression3x3Kernel |
| - NEOpticalFlow |
| - NEPhase |
| - NERemap |
| - NERemapKernel |
| - NEScharr3x3 |
| - NEScharr3x3Kernel |
| - NESobel3x3 |
| - NESobel3x3Kernel |
| - NESobel5x5 |
| - NESobel5x5HorKernel |
| - NESobel5x5VertKernel |
| - NESobel7x7 |
| - NESobel7x7HorKernel |
| - NESobel7x7VertKernel |
| - NEThreshold |
| - NEThresholdKernel |
| - NEWarpAffine |
| - NEWarpAffineKernel |
| - NEWarpPerspective |
| - NEWarpPerspectiveKernel |
| - Deprecated GLES kernels / functions (If a kernel is used only by the function that is being deprecated, the kernel is deprecated together): |
| - GCAbsoluteDifference |
| - GCActivationLayer |
| - GCArithmeticAddition |
| - GCBatchNormalizationLayer |
| - GCConcatenateLayer |
| - GCConvolutionLayer |
| - GCDepthwiseConvolutionLayer |
| - GCDirectConvolutionLayer |
| - GCDropoutLayer |
| - GCFillBorder |
| - GCFullyConnectedLayer |
| - GCGEMM |
| - GCGEMMInterleave4x4 |
| - GCGEMMTranspose1xW |
| - GCNormalizationLayer |
| - GCNormalizePlanarYUVLayer |
| - GCPixelWiseMultiplication |
| - GCPoolingLayer |
| - GCScale |
| - GCSoftmaxLayer |
| - GCTensorShift |
| - GCTranspose |
| |
| |
| v20.08 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Added new data type QASYMM8_SIGNED support for: |
| - @ref CLArgMinMaxLayer |
| - @ref CLArgMinMaxLayerKernel |
| - Added new data type U8 support for: |
| - @ref NECropKernel |
| - CLCropKernel |
| - Added align_corner support for nearest neighbor interpolation in: |
| - NEScaleKernel |
| - CLScaleKernel |
| - New OpenCL kernels / functions: |
| - @ref CLMaxUnpoolingLayerKernel |
| - New Arm® Neon™ kernels / functions: |
| - NEMaxUnpoolingLayerKernel |
| - New graph example: |
| - graph_yolov3_output_detector |
| - GEMMTuner improvements: |
| - Added fp16 support |
| - Output json files for easier integration |
| - Enabled tuning for export_to_cl_image_rhs option for RHS tensors |
| - More robust script for running benchmarks |
| - Removed padding from: |
| - NEPixelWiseMultiplicationKernel |
| - NEHeightConcatenateLayerKernel |
| - NEThresholdKernel |
| - NEBatchConcatenateLayerKernel |
| - NETransposeKernel |
| - @ref NEBatchNormalizationLayerKernel |
| - NEArithmeticSubtractionKernel |
| - @ref NEBoundingBoxTransformKernel |
| - NELogits1DMaxKernel |
| - NELogits1DSoftmaxKernel |
| - @ref NEROIPoolingLayerKernel |
| - @ref NEROIAlignLayerKernel |
| - NEYOLOLayerKernel |
| - NEUpsampleLayerKernel |
| - NEFloorKernel |
| - NEWidthConcatenateLayerKernel |
| - NEDepthConcatenateLayerKernel |
| - @ref NENormalizationLayerKernel |
| - @ref NEL2NormalizeLayerKernel |
| - NEFillArrayKernel |
| - NEDepthConvertLayerKernel |
| - @ref NERangeKernel |
| - @ref NEPriorBoxLayer |
| - Removed OpenCL kernels / functions: |
| - CLGEMMLowpQuantizeDownInt32ToUint8Scale |
| - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat |
| - Removed Arm® Neon™ kernels / functions: |
| - NEGEMMLowpQuantizeDownInt32ToUint8Scale |
| - NEGEMMMatrixAccumulateBiasesKernel |
| - Deprecated functions / interfaces: |
| - Non-descriptor based interfaces for NEThreshold, CLThreshold |
| - Non-descriptor based interfaces for @ref NEScale, @ref CLScale and GCScale |
| - In @ref NESoftmaxLayer, @ref NELogSoftmaxLayer, @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer : |
| The default "axis" value for @ref CLSoftmaxLayer, @ref CLLogSoftmaxLayer and GCSoftmaxLayer is changed from 1 to 0. |
| Only axis 0 is supported. |
| The default "axis" value for @ref NESoftmaxLayer, @ref NELogSoftmaxLayer is changed from 1 to 0. |
| Only axis 0 is supported. |
| - The support for quantized data types has been removed from @ref CLLogSoftmaxLayer due to implementation complexity. |
| - Removed padding requirement for the input (e.g. LHS of GEMM) and output in CLGEMMMatrixMultiplyNativeKernel, CLGEMMMatrixMultiplyReshapedKernel, CLGEMMMatrixMultiplyReshapedOnlyRHSKernel and CLIm2ColKernel (NHWC only) |
| - This change allows to use @ref CLGEMMConvolutionLayer without extra padding for the input and output. |
| - Only the weights/bias of @ref CLGEMMConvolutionLayer could require padding for the computation. |
| - Only on Arm® Mali™ Midgard GPUs, @ref CLGEMMConvolutionLayer could require padding since CLGEMMMatrixMultiplyKernel is called and currently requires padding. |
| - Added support for exporting the OpenCL buffer object to the OpenCL image object in CLGEMMMatrixMultiplyReshapedKernel and CLGEMMMatrixMultiplyReshapedOnlyRHSKernel. |
| - This support allows to export the OpenCL buffer used for the reshaped RHS matrix to the OpenCL image object. |
| - The padding requirement for the OpenCL image object is considered into the CLGEMMReshapeRHSMatrixKernel. |
| - The reshaped RHS matrix stores the weights when GEMM is used to accelerate CLGEMMConvolutionLayer. |
| |
| v20.05 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Updated recommended NDK version to r18b. |
| - Updated recommended gcc version to Linaro 6.3.1. |
| - Added Bfloat16 type support |
| - Added Bfloat16 support in: |
| - NEWeightsReshapeKernel |
| - NEConvolutionLayerReshapeWeights |
| - NEIm2ColKernel |
| - NEIm2Col |
| - NEDepthConvertLayerKernel |
| - @ref NEDepthConvertLayer |
| - @ref NEGEMMConvolutionLayer |
| - NEGEMMAssemblyDispatch |
| - Added new data type QASYMM8_SIGNED support for: |
| - @ref CLDirectConvolutionLayer |
| - @ref CLDeconvolutionLayer |
| - @ref CLDirectDeconvolutionLayer |
| - @ref CLGEMMDeconvolutionLayer |
| - CLGEMMLowpMatrixMultiplyReshapedKernel |
| - CLGEMMLowpQuantizeDownInt32ScaleKernel |
| - CLGEMMLowpQuantizeDownInt32ScaleByFloatKernel |
| - @ref CLReductionOperation |
| - @ref CLReduceMean |
| - @ref NEScale |
| - NEScaleKernel |
| - NEUpsampleLayer |
| - @ref NECast |
| - @ref NEReductionOperation |
| - @ref NEReduceMean |
| - @ref NEArgMinMaxLayer |
| - @ref NEDeconvolutionLayer |
| - NEGEMMLowpQuantizeDownInt32ScaleKernel |
| - @ref CPPBoxWithNonMaximaSuppressionLimit |
| - @ref CPPDetectionPostProcessLayer |
| - @ref CPPPermuteKernel |
| - @ref CPPPermute |
| - @ref CPPTopKVKernel |
| - @ref CPPTopKV |
| - @ref CPPUpsample |
| - @ref CPPUpsampleKernel |
| - New OpenCL kernels / functions: |
| - @ref CLQLSTMLayer |
| - @ref CLQLSTMLayerNormalizationKernel |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEQLSTMLayer |
| - @ref NEQLSTMLayerNormalizationKernel |
| - Added HARD_SWISH support in: |
| - CLActivationLayerKernel |
| - NEActivationLayerKernel |
| - Deprecated OpenCL kernels / functions: |
| - CLGEMMLowpQuantizeDownInt32ToUint8Scale |
| - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFloat |
| - Deprecated Arm® Neon™ kernels / functions: |
| - NEGEMMLowpQuantizeDownInt32ToUint8Scale |
| - Removed CPP kernels / functions: |
| - CPPFlipWeightsKernel |
| - Removed PoolingLayerInfo constructors without Data Layout. |
| - Removed CLDepthwiseConvolutionLayer3x3 |
| - Removed NEDepthwiseConvolutionLayerOptimized |
| - Added support for Winograd 3x3,4x4 on Arm® Neon™ FP16: |
| - @ref NEWinogradConvolutionLayer |
| - CpuWinogradConv2dTransformInputKernel |
| - CpuWinogradConv2dTransformOutputKernel |
| - CpuWinogradConv2dTransformWeightsKernel |
| - Added CLCompileContext |
| - Added Arm® Neon™ GEMM kernel with 2D window support |
| |
| v20.02.1 Maintenance release |
| - Added Android-NN build script. |
| |
| v20.02 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Added new data type QASYMM8_SIGNED support for: |
| - @ref CLDepthwiseConvolutionLayer |
| - CLDepthwiseConvolutionLayer3x3 |
| - @ref CLGEMMConvolutionLayer |
| - CLGEMMLowpMatrixMultiplyCore |
| - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
| - CLGEMMLowpMatrixMultiplyNativeKernel |
| - @ref NEActivationLayer |
| - NEComparisonOperationKernel |
| - @ref NEConvolutionLayer |
| - @ref NEDepthwiseConvolutionLayer |
| - NEDepthwiseConvolutionLayer3x3Kernel |
| - NEDirectConvolutionLayerOutputStageKernel |
| - @ref NEElementwiseComparison |
| - @ref NEElementwiseMax |
| - @ref NEElementwiseMin |
| - @ref NEElementwiseSquaredDiff |
| - @ref NEFullyConnectedLayer |
| - NEGEMMMatrixVectorMultiplyKernel |
| - @ref NEPixelWiseMultiplication |
| - @ref NEPoolingLayer |
| - @ref NEPReluLayer |
| - Added support for QSYMM8_PER_CHANNEL in: |
| - NEDepthwiseConvolutionLayer3x3Kernel |
| - Added support for split sizes in: |
| - @ref CLSplit |
| - @ref NESplit |
| - New OpenCL kernels / functions: |
| - @ref CLFill |
| - CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEFill |
| - NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToInt8ScaleByFixedPoint |
| - Deprecated Arm® Neon™ functions / interfaces: |
| - CLDepthwiseConvolutionLayer3x3 |
| - NEDepthwiseConvolutionLayerOptimized |
| - PoolingLayerInfo constructors without Data Layout. |
| - Added support for quantization with multiplier greater than 1 on Arm® Neon™ and CL. |
| - Added support for quantized inputs of type QASYMM8_SIGNED and QASYMM8 to @ref CLQuantizationLayer. |
| - Added the ability to build bootcode for bare metal. |
| - Added support for generating synthetic QASYMM8 graphs. |
| - Added support for F16 datatype in VGG16. |
| - Removed pre-built binaries for GLES. |
| |
| v19.11.1 Public maintenance release |
| - Fix offset calculation in NEReductionOperationKernel. |
| - Fix data layout in NEScaleKernel for nhwc. |
| - Retain configuration step data layout to avoid side-effects. |
| - Perform sqrt in double domain for L2 pooling. |
| - Fix output shape calculation for Reduce Mean |
| - Restrict cases where optimized NEPadLayer runs. |
| |
| v19.11 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Updated recommended NDK version to r17c. |
| - Deprecated OpenCL kernels / functions: |
| - CLDepthwiseConvolutionLayerReshapeWeightsGenericKernel |
| - CLDepthwiseIm2ColKernel |
| - CLDepthwiseSeparableConvolutionLayer |
| - CLDepthwiseVectorToTensorKernel |
| - CLDirectConvolutionLayerOutputStageKernel |
| - Deprecated Arm® Neon™ kernels / functions: |
| - NEDepthwiseWeightsReshapeKernel |
| - NEDepthwiseIm2ColKernel |
| - NEDepthwiseSeparableConvolutionLayer |
| - NEDepthwiseVectorToTensorKernel |
| - NEDepthwiseConvolutionLayer3x3 |
| - New OpenCL kernels / functions: |
| - @ref CLInstanceNormalizationLayerKernel / @ref CLInstanceNormalizationLayer |
| - @ref CLDepthwiseConvolutionLayerNativeKernel to replace the old generic depthwise convolution (see Deprecated |
| OpenCL kernels / functions) |
| - @ref CLLogSoftmaxLayer |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEBoundingBoxTransformKernel / @ref NEBoundingBoxTransform |
| - @ref NEComputeAllAnchorsKernel / NEComputeAllAnchors |
| - @ref NEDetectionPostProcessLayer |
| - @ref NEGenerateProposalsLayer |
| - @ref NEInstanceNormalizationLayerKernel / @ref NEInstanceNormalizationLayer |
| - @ref NELogSoftmaxLayer |
| - @ref NEROIAlignLayerKernel / @ref NEROIAlignLayer |
| - Added QASYMM8 support for: |
| - @ref CLGenerateProposalsLayer |
| - @ref CLROIAlignLayer |
| - @ref CPPBoxWithNonMaximaSuppressionLimit |
| - Added QASYMM16 support for: |
| - @ref CLBoundingBoxTransform |
| - Added FP16 support for: |
| - CLGEMMMatrixMultiplyReshapedKernel |
| - Added new data type QASYMM8_PER_CHANNEL support for: |
| - CLDequantizationLayer |
| - @ref NEDequantizationLayer |
| - Added new data type QSYMM8_PER_CHANNEL support for: |
| - @ref CLConvolutionLayer |
| - @ref NEConvolutionLayer |
| - @ref CLDepthwiseConvolutionLayer |
| - @ref NEDepthwiseConvolutionLayer |
| - Added FP16 mixed-precision support for: |
| - CLGEMMMatrixMultiplyReshapedKernel |
| - CLPoolingLayerKernel |
| - Added FP32 and FP16 ELU activation for: |
| - @ref CLActivationLayer |
| - @ref NEActivationLayer |
| - Added asymmetric padding support for: |
| - @ref CLDirectDeconvolutionLayer |
| - @ref CLGEMMDeconvolutionLayer |
| - @ref NEDeconvolutionLayer |
| - Added SYMMETRIC and REFLECT modes for @ref CLPadLayerKernel / @ref CLPadLayer. |
| - Replaced the calls to NECopyKernel and NEMemsetKernel with @ref NEPadLayer in @ref NEGenerateProposalsLayer. |
| - Replaced the calls to CLCopyKernel and CLMemsetKernel with @ref CLPadLayer in @ref CLGenerateProposalsLayer. |
| - Improved performance for CL Inception V3 - FP16. |
| - Improved accuracy for CL Inception V3 - FP16 by enabling FP32 accumulator (mixed-precision). |
| - Improved Arm® Neon™ performance by enabling fusing batch normalization with convolution and depth-wise convolution layer. |
| - Improved Arm® Neon™ performance for MobileNet-SSD by improving the output detection performance. |
| - Optimized @ref CLPadLayer. |
| - Optimized CL generic depthwise convolution layer by introducing @ref CLDepthwiseConvolutionLayerNativeKernel. |
| - Reduced memory consumption by implementing weights sharing. |
| |
| v19.08.1 Public maintenance release |
| - Fix offset calculation in NEReductionOperationKernel. |
| - Fix data layout in NEScaleKernel for nhwc. |
| - Retain configuration step data layout to avoid side-effects. |
| - Perform sqrt in double domain for L2 pooling. |
| - Fix output shape calculation for Reduce Mean |
| - Fix broadcast CLPixelwiseMultiplication with 5D tensors |
| |
| v19.08 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Deprecated Arm® Neon™ functions |
| - NEDepthConcatenateLayer |
| - NEWidthConcatenateLayer |
| - Deprecated OpenCL kernels / functions |
| - CLDepthConcatenateLayer |
| - CLGEMMInterleave4x4Kernel / CLGEMMInterleave4x4 |
| - CLGEMMTranspose1xWKernel / CLGEMMTranspose1xW |
| - CLWidthConcatenateLayer |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEAbsLayer |
| - @ref NECast |
| - @ref NEElementwisePower |
| - @ref NELogLayer |
| - @ref NELSTMLayerQuantized |
| - @ref NENegLayer |
| - @ref NEPReluLayer |
| - @ref NESinLayer |
| - NEBatchConcatenateLayerKernel |
| - @ref NEDepthToSpaceLayerKernel / @ref NEDepthToSpaceLayer |
| - NEDepthwiseConvolutionLayerNativeKernel |
| - NEGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
| - @ref NEMeanStdDevNormalizationKernel / @ref NEMeanStdDevNormalizationLayer |
| - @ref NESpaceToDepthLayerKernel / @ref NESpaceToDepthLayer |
| - New OpenCL kernels / functions: |
| - @ref CLAbsLayer |
| - @ref CLElementwisePower |
| - @ref CLLogLayer |
| - @ref CLLSTMLayerQuantized |
| - @ref CLNegLayer |
| - @ref CLPReluLayer |
| - @ref CLSinLayer |
| - CLBatchConcatenateLayerKernel |
| - @ref CLDepthToSpaceLayerKernel / @ref CLDepthToSpaceLayer |
| - CLGEMMLowpMatrixMultiplyNativeKernel |
| - CLGEMMLowpQuantizeDownInt32ToInt16ScaleByFixedPointKernel |
| - CLGEMMMatrixMultiplyNativeKernel |
| - CLMeanStdDevNormalizationKernel /CLMeanStdDevNormalizationLayer |
| - @ref CLSpaceToDepthLayerKernel / @ref CLSpaceToDepthLayer |
| - New examples: |
| - neon_opticalflow |
| - cl_cache |
| - neon_permute |
| - Added support for FP16 in @ref NEDeconvolutionLayer |
| - Added support for FP16 in @ref CLDeconvolutionLayer |
| - Added support for REDUCE_MIN and REDUCE_MAX in @ref ReductionOperation |
| - Enable the fusion of batch normalization with convolution and depthwise convolution layer for FP32 in the graph API (OpenCL only) |
| - Added support for fusing activation function and broadcast addition with the matrix multiplication for FP32 (OpenCL only) |
| - Re-factored the depthwise convolution layer kernel on Arm® Neon™ for generic cases |
| - Added an optimized depthwise convolution layer kernel for 5x5 filters (Neon™ only) |
| - Added support to enable OpenCL kernel cache. Added example showing how to load the prebuilt OpenCL kernels from a binary cache file |
| - Altered @ref QuantizationInfo interface to support per-channel quantization. |
| - The CLDepthwiseConvolutionLayer3x3 will be included by @ref CLDepthwiseConvolutionLayer to accommodate for future optimizations. |
| - The NEDepthwiseConvolutionLayerOptimized will be included by @ref NEDepthwiseConvolutionLayer to accommodate for future optimizations. |
| - Removed inner_border_right and inner_border_top parameters from @ref CLDeconvolutionLayer interface |
| - Removed inner_border_right and inner_border_top parameters from @ref NEDeconvolutionLayer interface |
| - Optimized the Arm® Neon™ assembly kernel for GEMMLowp. The new implementation fuses the output stage and quantization with the matrix multiplication kernel |
| |
| v19.05 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEBatchToSpaceLayerKernel / @ref NEBatchToSpaceLayer |
| - NEComplexPixelWiseMultiplicationKernel / @ref NEComplexPixelWiseMultiplication |
| - @ref NECropKernel / @ref NECropResize |
| - NEDepthwiseConvolutionAssemblyDispatch |
| - @ref NEFFTDigitReverseKernel |
| - @ref NEFFTRadixStageKernel |
| - @ref NEFFTScaleKernel |
| - NEGEMMLowpOffsetContributionOutputStageKernel |
| - NEHeightConcatenateLayerKernel |
| - @ref NESpaceToBatchLayerKernel / @ref NESpaceToBatchLayer |
| - @ref NEFFT1D |
| - @ref NEFFT2D |
| - @ref NEFFTConvolutionLayer |
| - New OpenCL kernels / functions: |
| - CLComplexPixelWiseMultiplicationKernel / @ref CLComplexPixelWiseMultiplication |
| - CLCropKernel / @ref CLCropResize |
| - @ref CLDeconvolutionReshapeOutputKernel |
| - @ref CLFFTDigitReverseKernel |
| - @ref CLFFTRadixStageKernel |
| - @ref CLFFTScaleKernel |
| - CLGEMMLowpMatrixMultiplyReshapedOnlyRHSKernel |
| - CLGEMMMatrixMultiplyReshapedOnlyRHSKernel |
| - CLHeightConcatenateLayerKernel |
| - @ref CLDirectDeconvolutionLayer |
| - @ref CLFFT1D |
| - @ref CLFFT2D |
| - @ref CLFFTConvolutionLayer |
| - @ref CLGEMMDeconvolutionLayer |
| - New OpenGLES kernels / functions: |
| - GCConcatenateLayer |
| - Deprecated functions/interfaces |
| - GCDepthConcatenateLayer |
| - NEWidthConcatenateLayer |
| - NEDepthConcatenateLayer |
| - CLWidthConcatenateLayer |
| - CLDepthConcatenateLayer |
| - CLGEMMInterleave4x4 |
| - CLGEMMTranspose1xW |
| - Support different quantization info in CLConcatLayer. |
| - Add checks on different input/output quantization info were not supported. |
| - Tensors have different quantization information. |
| - Add FP16 support checks. |
| - Fix output quantization CLDeptwiseConv3x3 when activation is fused. |
| - New graph examples: |
| - graph_convolution |
| - graph_fully_connected |
| - graph_depthwise_convolution |
| - Deepspeech v0.4.1 |
| - Add support for QASYMM8 in NEArithmeticSubtractionKernel. |
| - Add support for QASYMM8 in NEPixelWiseMultiplicationKernel. |
| - Add support for QASYMM8 NEDeconvolution. |
| - Add support for DequantizationLayer for Neon/CL. |
| - Add support for dilation in CLDepthwiseConvolution. |
| - Fuse offset contribution with the output stage when we use NEGEMMLowpMatrixMultiplyCore. |
| - Optimize CLDeconvolution. |
| - Add StackLayer to the graph API. |
| - Add support for "reflect" padding mode in NEPad. |
| - Winograd 7x7 NHWC on OpenCL. |
| - Rework CL ML layers to run exclusively on CL. |
| - Support different quantization info in PoolingLayer. |
| - Implement and test import memory interfaces. |
| - Added new tests and removed old ones. |
| - Various clang-tidy fixes. |
| |
| v19.02 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - New Arm® Neon™ kernels / functions: |
| - @ref NETileKernel / @ref NETile |
| - @ref NEFuseBatchNormalizationKernel / @ref NEFuseBatchNormalization |
| - NEElementwiseOperationKernel |
| - @ref NEElementwiseMax |
| - @ref NEElementwiseMin |
| - @ref NEElementwiseSquaredDiff |
| - @ref NESelectKernel / @ref NESelect |
| - @ref NESplit |
| - @ref NESlice |
| - @ref NEUnstack |
| - @ref NEStridedSliceKernel / @ref NEStridedSlice |
| - NEElementwiseUnaryKernel |
| - @ref NERsqrtLayer |
| - @ref NEExpLayer |
| - @ref NEReverseKernel / @ref NEReverse |
| - @ref NEArgMinMaxLayer |
| - @ref NEStackLayerKernel / @ref NEStackLayer |
| - @ref NERangeKernel / @ref NERange |
| - @ref NEPadLayer |
| - NEMemsetKernel |
| - @ref NEGatherKernel / @ref NEGather |
| - @ref NEElementwiseComparison |
| - @ref NEElementwiseComparisonStatic |
| - NEComparisonOperationKernel |
| - @ref NEElementwiseDivision |
| - New OpenCL kernels / functions: |
| - @ref CLSelectKernel / @ref CLSelect |
| - @ref CLTileKernel / @ref CLTile |
| - @ref CLComparisonKernel / @ref CLComparison |
| - @ref CLArgMinMaxLayer |
| - @ref CLElementwiseMax |
| - @ref CLElementwiseMin |
| - @ref CLElementwiseSquaredDiff |
| - @ref CLStackLayerKernel / @ref CLStackLayer |
| - @ref CLReverse / @ref CLReverseKernel |
| - @ref CLRsqrtLayer |
| - @ref CLExpLayer |
| - CLElementWiseUnaryLayerKernel |
| - CLGEMMReshapeLHSMatrixKernel |
| - CLGEMMReshapeRHSMatrixKernel |
| - CLGEMMMatrixMultiplyReshapedKernel |
| - @ref CLRangeKernel / @ref CLRange |
| - @ref CLUnstack |
| - @ref CLGatherKernel / @ref CLGather |
| - CLGEMMLowpMatrixMultiplyReshapedKernel |
| - New CPP kernels / functions: |
| - @ref CPPDetectionOutputLayer |
| - @ref CPPTopKV / @ref CPPTopKVKernel |
| - Added new examples: |
| - graph_ssd_mobilenet.cpp |
| - graph_mobilenet_v2.cpp |
| - graph_resnet12.cpp |
| - graph_srcnn955.cpp |
| - graph_vgg_vdsr.cpp |
| - graph_inception_resnet_v1.cpp |
| - Add 4D tensors support to |
| - @ref NESoftmaxLayer |
| - Fused activation in @ref CLWinogradConvolutionLayer |
| - Extended @ref NEPermute to support more cases |
| - Added Neon™/SVE GEMM Hybrid kernels |
| - Added u8 and s8 hybrid assembly kernels |
| - Introduced GEMM strategy name in NEGEMMAssemblyWrapper |
| - Improved @ref CLTuner |
| - Fused the bias addition within @ref CLGEMM |
| - Added support for QASYMM8 LOGISTIC activation in @ref NEActivationLayer |
| - Added NHWC data layout support to: |
| - @ref NEScale for F16 |
| - @ref CLNormalizationLayer IN_MAP_2D for FP32/FP16 |
| - @ref NEL2NormalizeLayer for FP32/FP16 |
| - @ref NENormalizationLayer IN_MAP_2D for FP32/FP16 |
| - @ref CLROIAlignLayer |
| - @ref CLGenerateProposalsLayer |
| - Added QASYMM8 support to the following kernels: |
| - NEArithmeticAdditionKernel |
| - @ref NEScale |
| - Added new tests and improved validation and benchmarking suites. |
| - Deprecated functions/interfaces |
| - Usage of inner_border_right and inner_border_top has been deprecated in @ref CLDeconvolutionLayer and @ref NEDeconvolutionLayer |
| |
| v18.11 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEChannelShuffleLayer / @ref NEChannelShuffleLayerKernel |
| - @ref NEReduceMean |
| - @ref NEReorgLayer / @ref NEReorgLayerKernel |
| - @ref NEPriorBoxLayer / @ref NEPriorBoxLayerKernel |
| - NEUpsampleLayer / NEUpsampleLayerKernel |
| - NEYOLOLayer / NEYOLOLayerKernel |
| - New OpenCL kernels / functions: |
| - @ref CLBatchToSpaceLayer / @ref CLBatchToSpaceLayerKernel |
| - @ref CLBoundingBoxTransform / @ref CLBoundingBoxTransformKernel |
| - @ref CLComputeAllAnchorsKernel |
| - @ref CLGenerateProposalsLayer |
| - @ref CLNormalizePlanarYUVLayer / @ref CLNormalizePlanarYUVLayerKernel |
| - @ref CLReorgLayer / @ref CLReorgLayerKernel |
| - @ref CLSpaceToBatchLayer / @ref CLSpaceToBatchLayerKernel |
| - @ref CLPadLayer |
| - @ref CLReduceMean |
| - @ref CLPriorBoxLayer / @ref CLPriorBoxLayerKernel |
| - @ref CLROIAlignLayer / @ref CLROIAlignLayerKernel |
| - @ref CLSlice |
| - @ref CLSplit |
| - @ref CLStridedSlice / @ref CLStridedSliceKernel |
| - CLUpsampleLayer / CLUpsampleLayerKernel |
| - CLYOLOLayer / CLYOLOLayerKernel |
| - New CPP kernels / functions: |
| - @ref CPPBoxWithNonMaximaSuppressionLimit / @ref CPPBoxWithNonMaximaSuppressionLimitKernel |
| - Added the validate method in: |
| - @ref NEDepthConvertLayer |
| - @ref NEFloor / @ref CLFloor |
| - NEGEMMMatrixAdditionKernel |
| - @ref NEReshapeLayer / @ref CLReshapeLayer |
| - @ref CLScale |
| - Added new examples: |
| - graph_shufflenet.cpp |
| - graph_yolov3.cpp |
| - Added documentation for add a new function or kernel. |
| - Improved doxygen documentation adding a list of the existing functions. |
| - Add 4D tensors support to |
| - CLWidthConcatenateLayer |
| - CLFlattenLayer |
| - @ref CLSoftmaxLayer |
| - Add dot product support for CLDepthwiseConvolutionLayer3x3NHWCKernel non-unit stride |
| - Add SVE support |
| - Fused batch normalization into convolution layer weights in @ref CLFuseBatchNormalization |
| - Fuses activation in CLDepthwiseConvolutionLayer3x3NCHWKernel, CLDepthwiseConvolutionLayer3x3NHWCKernel and @ref NEGEMMConvolutionLayer |
| - Added NHWC data layout support to: |
| - @ref CLChannelShuffleLayer |
| - @ref CLDeconvolutionLayer |
| - @ref CLL2NormalizeLayer |
| - Added QASYMM8 support to the following kernels: |
| - CLScaleKernel |
| - NEDepthwiseConvolutionLayer3x3Kernel |
| - CLPixelWiseMultiplicationKernel |
| - Added FP16 support to the following kernels: |
| - CLDepthwiseConvolutionLayer3x3NHWCKernel |
| - NEDepthwiseConvolutionLayer3x3Kernel |
| - @ref CLNormalizePlanarYUVLayerKernel |
| - @ref CLWinogradConvolutionLayer (5x5 kernel) |
| - More tests added to both validation and benchmarking suites. |
| |
| v18.08 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Updated recommended NDK version to r17b. |
| - Removed support for QS8/QS16 data types. |
| - Added support for grouped convolution in @ref CLConvolutionLayer. |
| - Added NHWC data layout support to: |
| - NEDepthConcatenateLayer / CLDepthConcatenateLayer |
| - @ref NEWinogradConvolutionLayer / @ref CLWinogradConvolutionLayer |
| - @ref CLDepthwiseConvolutionLayer |
| - @ref CLDirectConvolutionLayer |
| - @ref CLConvolutionLayer |
| - @ref CLScale |
| - CLIm2ColKernel |
| - New Arm® Neon™ kernels / functions: |
| - @ref NERNNLayer |
| - New OpenCL kernels / functions: |
| - @ref CLArithmeticDivision |
| - Introduced prepare() stage support in the graph API for GLES. |
| - Added support for memory reusage when trying to allocate smaller CLTensors. |
| - Enabled NHWC execution on graph examples. |
| - Added JPEG accessor for validation purposes. |
| - Added validate methods to some kernels / functions. |
| |
| v18.05 Public major release |
| - Various bug fixes. |
| - Various optimisations. |
| - Major redesign in the interface for the Neon™ kernels implemented in assembly. |
| - Removed arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore / arm_compute::NEHGEMMAArch64FP16Kernel |
| - Added NEGEMMAssemblyWrapper and AssemblyKernelGlue which are used to execute assembly kernels in Neon™ functions. |
| - Minor changes to the CPUInfo type to make it compatible with the new assembly gemm interface. |
| - Moved Neon™ assembly kernels to the folder src/core/Neon/kernels/arm_gemm. |
| - Improved doxygen documentation. |
| - Improved memory management for layer's transitions. |
| - Added support for NHWC data layout in tensors. |
| - Added NHWC data layout support to: |
| - @ref NEGEMMConvolutionLayer |
| - @ref NEDirectConvolutionLayer |
| - @ref NEPoolingLayer / @ref CLPoolingLayer |
| - @ref NEBatchNormalizationLayer / @ref CLBatchNormalizationLayer |
| - @ref NEDepthwiseConvolutionLayer |
| - @ref NEScale |
| - NEIm2Col |
| - Added support for dilated convolutions in @ref NEConvolutionLayer and @ref CLConvolutionLayer. |
| - New OpenCL kernels / functions: |
| - @ref CLChannelShuffleLayer / @ref CLChannelShuffleLayerKernel |
| - CLConvertFullyConnectedWeightsKernel / @ref CLConvertFullyConnectedWeights |
| - @ref CLCopy / CLCopyKernel |
| - @ref CLLSTMLayer |
| - @ref CLRNNLayer |
| - CLWidthConcatenateLayer / CLWidthConcatenateLayerKernel |
| - CLWinogradFilterTransformKernel / @ref CLWinogradConvolutionLayer |
| - CLWinogradInputTransformKernel / CLWinogradInputTransform |
| - New Arm® Neon™ kernels / functions: |
| - NEConvertFullyConnectedWeightsKernel / @ref NEConvertFullyConnectedWeights. |
| - Created the validate method in @ref CLDepthwiseConvolutionLayer. |
| - Beta and gamma are no longer mandatory arguments in @ref NEBatchNormalizationLayer and @ref CLBatchNormalizationLayer. |
| - Added depth multiplier support in @ref NEDepthwiseConvolutionLayer and @ref CLDepthwiseConvolutionLayer. |
| - Added broadcast multiply support in @ref NEPixelWiseMultiplication / NEPixelWiseMultiplicationKernel. |
| - Port mobilenet example to NHWC data layout. |
| - Enabled Winograd method in @ref CLConvolutionLayer. |
| - Renamed NEWinogradLayer to @ref NEWinogradConvolutionLayer. |
| - Updated @ref NEWinogradConvolutionLayer to use highly optimised assembly kernels in src/core/Neon/kernels/arm_gemm. |
| - Added memory manager support in GLES functions. |
| - Major refactoring of the graph API. |
| - Added GLES backend in the graph API. |
| - Added support for the memory manager in the graph API. |
| - Enabled Winograd Convolution method in the graph API. |
| - Added support for grouped convolutions in the graph API. |
| - Replaced NEDeconvolutionLayerUpsampleKernel with NEScaleKernel in @ref NEDeconvolutionLayer. |
| - Added fast maths flag in @ref CLConvolutionLayer. |
| - Added new tests and benchmarks in validation and benchmark frameworks |
| - Merge Activation layer with Convolution Layer (Neon™, CL, GLES) |
| - Added support to OpenCL 2.0 SVM |
| - Added support to import memory in OpenCL tensors. |
| - Added the prepare() method to perform any one off pre-processing before running the function. |
| - Added new examples: |
| - graph_inception_v4.cpp |
| - graph_resnext50.cpp |
| - Added memory measurement instrument for CL. |
| |
| v18.03 Public maintenance release |
| - Various bug fixes. |
| - Fixed bug in @ref NEActivationLayer |
| - Fix in @ref CLTuner when using batches. |
| - Updated recommended NDK version to r16b (And fixed warnings). |
| - Fixed bug in validation code. |
| - Added Inception v4 graph example. |
| - Renamed NEWinogradLayer.cpp to @ref NEWinogradConvolutionLayer |
| |
| v18.02 Public major release |
| - Various Arm® Neon™ / OpenCL / GLES optimisations. |
| - Various bug fixes. |
| - Changed default number of threads on big LITTLE systems. |
| - Refactored examples and added: |
| - graph_mobilenet_qassym8 |
| - graph_resnet |
| - graph_squeezenet_v1_1 |
| - Renamed @ref CLConvolutionLayer into @ref CLGEMMConvolutionLayer and created a new @ref CLConvolutionLayer to select the fastest convolution method. |
| - Renamed @ref NEConvolutionLayer into @ref NEGEMMConvolutionLayer and created a new @ref NEConvolutionLayer to select the fastest convolution method. |
| - Added in place support to: |
| - @ref CLActivationLayer |
| - @ref CLBatchNormalizationLayer |
| - Added QASYMM8 support to: |
| - @ref CLActivationLayer |
| - @ref CLDepthwiseConvolutionLayer |
| - @ref NEDepthwiseConvolutionLayer |
| - @ref NESoftmaxLayer |
| - Added FP16 support to: |
| - CLDepthwiseConvolutionLayer3x3 |
| - @ref CLDepthwiseConvolutionLayer |
| - Added broadcasting support to NEArithmeticAddition / @ref CLArithmeticAddition / @ref CLPixelWiseMultiplication |
| - Added fused batched normalization and activation to @ref CLBatchNormalizationLayer and @ref NEBatchNormalizationLayer |
| - Added support for non-square pooling to @ref NEPoolingLayer and @ref CLPoolingLayer |
| - New OpenCL kernels / functions: |
| - CLDirectConvolutionLayerOutputStageKernel |
| - New Arm® Neon™ kernels / functions |
| - Added name() method to all kernels. |
| - Added support for Winograd 5x5. |
| - NEPermuteKernel / @ref NEPermute |
| - CpuWinogradConv2dTransformInputKernel / NEWinogradLayer |
| - CpuWinogradConv2dTransformOutputKernel / NEWinogradLayer |
| - CpuWinogradConv2dTransformWeightsKernel / NEWinogradLayer |
| - Renamed NEWinogradLayerKernel into NEWinogradLayerBatchedGEMMKernel |
| - New GLES kernels / functions: |
| - GCTensorShiftKernel / GCTensorShift |
| |
| v18.01 Public maintenance release |
| - Various bug fixes |
| - Added some of the missing validate() methods |
| - Added @ref CLDeconvolutionLayerUpsampleKernel / @ref CLDeconvolutionLayer @ref CLDeconvolutionLayerUpsample |
| - Added CLPermuteKernel / @ref CLPermute |
| - Added method to clean the programs cache in the CL Kernel library. |
| - Added GCArithmeticAdditionKernel / GCArithmeticAddition |
| - Added GCDepthwiseConvolutionLayer3x3Kernel / GCDepthwiseConvolutionLayer3x3 |
| - Added GCNormalizePlanarYUVLayerKernel / GCNormalizePlanarYUVLayer |
| - Added GCScaleKernel / GCScale |
| - Added GCWeightsReshapeKernel / GCConvolutionLayer |
| - Added FP16 support to the following GLES compute kernels: |
| - GCCol2ImKernel |
| - GCGEMMInterleave4x4Kernel |
| - GCGEMMTranspose1xWKernel |
| - GCIm2ColKernel |
| - Refactored Arm® Neon™ Winograd (NEWinogradLayerKernel) |
| - Added NEDirectConvolutionLayerOutputStageKernel |
| - Added QASYMM8 support to the following Arm® Neon™ kernels: |
| - NEDepthwiseConvolutionLayer3x3Kernel |
| - @ref NEFillBorderKernel |
| - NEPoolingLayerKernel |
| - Added new examples: |
| - graph_cl_mobilenet_qasymm8.cpp |
| - graph_inception_v3.cpp |
| - gc_dc.cpp |
| - More tests added to both validation and benchmarking suites. |
| |
| v17.12 Public major release |
| - Most machine learning functions on OpenCL support the new data type QASYMM8 |
| - Introduced logging interface |
| - Introduced opencl timer |
| - Reworked GEMMLowp interface |
| - Added new Arm® Neon™ assembly kernels for GEMMLowp, SGEMM and HGEMM |
| - Added validation method for most Machine Learning kernels / functions |
| - Added new graph examples such as googlenet, mobilenet, squeezenet, vgg16 and vgg19 |
| - Added sgemm example for OpenCL |
| - Added absolute difference example for GLES compute |
| - Added new tests and benchmarks in validation and benchmark frameworks |
| - Added new kernels / functions for GLES compute |
| |
| - New OpenGL ES kernels / functions |
| - GCAbsoluteDifferenceKernel / GCAbsoluteDifference |
| - GCActivationLayerKernel / GCActivationLayer |
| - GCBatchNormalizationLayerKernel / GCBatchNormalizationLayer |
| - GCCol2ImKernel |
| - GCDepthConcatenateLayerKernel / GCDepthConcatenateLayer |
| - GCDirectConvolutionLayerKernel / GCDirectConvolutionLayer |
| - GCDropoutLayerKernel / GCDropoutLayer |
| - GCFillBorderKernel / GCFillBorder |
| - GCGEMMInterleave4x4Kernel / GCGEMMInterleave4x4 |
| - GCGEMMMatrixAccumulateBiasesKernel / GCGEMMMatrixAdditionKernel / GCGEMMMatrixMultiplyKernel / GCGEMM |
| - GCGEMMTranspose1xWKernel / GCGEMMTranspose1xW |
| - GCIm2ColKernel |
| - GCNormalizationLayerKernel / GCNormalizationLayer |
| - GCPixelWiseMultiplicationKernel / GCPixelWiseMultiplication |
| - GCPoolingLayerKernel / GCPoolingLayer |
| - GCLogits1DMaxKernel / GCLogits1DShiftExpSumKernel / GCLogits1DNormKernel / GCSoftmaxLayer |
| - GCTransposeKernel / GCTranspose |
| |
| - New Arm® Neon™ kernels / functions |
| - arm_compute::NEGEMMLowpAArch64A53Kernel / arm_compute::NEGEMMLowpAArch64Kernel / arm_compute::NEGEMMLowpAArch64V8P4Kernel / arm_compute::NEGEMMInterleavedBlockedKernel / arm_compute::NEGEMMLowpAssemblyMatrixMultiplyCore |
| - arm_compute::NEHGEMMAArch64FP16Kernel |
| - NEDepthwiseConvolutionLayer3x3Kernel / NEDepthwiseIm2ColKernel / NEGEMMMatrixVectorMultiplyKernel / NEDepthwiseVectorToTensorKernel / @ref NEDepthwiseConvolutionLayer |
| - NEGEMMLowpOffsetContributionKernel / NEGEMMLowpMatrixAReductionKernel / NEGEMMLowpMatrixBReductionKernel / NEGEMMLowpMatrixMultiplyCore |
| - NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint |
| - NEWinogradLayer / NEWinogradLayerKernel |
| |
| - New OpenCL kernels / functions |
| - CLGEMMLowpOffsetContributionKernel / CLGEMMLowpMatrixAReductionKernel / CLGEMMLowpMatrixBReductionKernel / CLGEMMLowpMatrixMultiplyCore |
| - CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel / CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint |
| |
| - New graph nodes for Arm® Neon™ and OpenCL |
| - graph::BranchLayer |
| - graph::DepthConvertLayer |
| - graph::DepthwiseConvolutionLayer |
| - graph::DequantizationLayer |
| - graph::FlattenLayer |
| - graph::QuantizationLayer |
| - graph::ReshapeLayer |
| |
| v17.10 Public maintenance release |
| - Bug fixes: |
| - Check the maximum local workgroup size supported by OpenCL devices |
| - Minor documentation updates (Fixed instructions to build the examples) |
| - Introduced a graph::GraphContext |
| - Added a few new Graph nodes, support for branches and grouping. |
| - Automatically enable cl_printf in debug builds |
| - Fixed bare metal builds for armv7a |
| - Added AlexNet and cartoon effect examples |
| - Fixed library builds: libraries are no longer built as supersets of each other.(It means application using the Runtime part of the library now need to link against both libarm_compute_core and libarm_compute) |
| |
| v17.09 Public major release |
| - Experimental Graph support: initial implementation of a simple stream API to easily chain machine learning layers. |
| - Memory Manager (@ref BlobLifetimeManager, @ref BlobMemoryPool, @ref ILifetimeManager, @ref IMemoryGroup, @ref IMemoryManager, @ref IMemoryPool, @ref IPoolManager, @ref MemoryManagerOnDemand, @ref PoolManager) |
| - New validation and benchmark frameworks (Boost and Google frameworks replaced by homemade framework). |
| - Most machine learning functions support both fixed point 8 and 16 bit (QS8, QS16) for both Arm® Neon™ and OpenCL. |
| - New Arm® Neon™ kernels / functions: |
| - arm_compute::NEGEMMAssemblyBaseKernel arm_compute::NEGEMMAArch64Kernel |
| - NEDequantizationLayerKernel / @ref NEDequantizationLayer |
| - NEFloorKernel / @ref NEFloor |
| - @ref NEL2NormalizeLayerKernel / @ref NEL2NormalizeLayer |
| - NEQuantizationLayerKernel NEMinMaxLayerKernel / @ref NEQuantizationLayer |
| - @ref NEROIPoolingLayerKernel / @ref NEROIPoolingLayer |
| - @ref NEReductionOperationKernel / @ref NEReductionOperation |
| - NEReshapeLayerKernel / @ref NEReshapeLayer |
| |
| - New OpenCL kernels / functions: |
| - CLDepthwiseConvolutionLayer3x3NCHWKernel CLDepthwiseConvolutionLayer3x3NHWCKernel CLDepthwiseIm2ColKernel CLDepthwiseVectorToTensorKernel CLDepthwiseWeightsReshapeKernel / CLDepthwiseConvolutionLayer3x3 @ref CLDepthwiseConvolutionLayer CLDepthwiseSeparableConvolutionLayer |
| - CLDequantizationLayerKernel / CLDequantizationLayer |
| - CLDirectConvolutionLayerKernel / @ref CLDirectConvolutionLayer |
| - CLFlattenLayer |
| - CLFloorKernel / @ref CLFloor |
| - CLGEMMTranspose1xW |
| - CLGEMMMatrixVectorMultiplyKernel |
| - @ref CLL2NormalizeLayerKernel / @ref CLL2NormalizeLayer |
| - CLQuantizationLayerKernel CLMinMaxLayerKernel / @ref CLQuantizationLayer |
| - @ref CLROIPoolingLayerKernel / @ref CLROIPoolingLayer |
| - @ref CLReductionOperationKernel / @ref CLReductionOperation |
| - CLReshapeLayerKernel / @ref CLReshapeLayer |
| |
| v17.06 Public major release |
| - Various bug fixes |
| - Added support for fixed point 8 bit (QS8) to the various Arm® Neon™ machine learning kernels. |
| - Added unit tests and benchmarks (AlexNet, LeNet) |
| - Added support for sub tensors. |
| - Added infrastructure to provide GPU specific optimisation for some OpenCL kernels. |
| - Added @ref OMPScheduler (OpenMP) scheduler for Neon |
| - Added @ref SingleThreadScheduler scheduler for Arm® Neon™ (For bare metal) |
| - User can specify their own scheduler by implementing the @ref IScheduler interface. |
| - New OpenCL kernels / functions: |
| - @ref CLBatchNormalizationLayerKernel / @ref CLBatchNormalizationLayer |
| - CLDepthConcatenateLayerKernel / CLDepthConcatenateLayer |
| - CLHOGOrientationBinningKernel CLHOGBlockNormalizationKernel, CLHOGDetectorKernel / CLHOGDescriptor CLHOGDetector CLHOGGradient CLHOGMultiDetection |
| - CLLocallyConnectedMatrixMultiplyKernel / CLLocallyConnectedLayer |
| - CLWeightsReshapeKernel / CLConvolutionLayerReshapeWeights |
| - New C++ kernels: |
| - CPPDetectionWindowNonMaximaSuppressionKernel |
| - New Arm® Neon™ kernels / functions: |
| - @ref NEBatchNormalizationLayerKernel / @ref NEBatchNormalizationLayer |
| - NEDepthConcatenateLayerKernel / NEDepthConcatenateLayer |
| - NEDirectConvolutionLayerKernel / @ref NEDirectConvolutionLayer |
| - NELocallyConnectedMatrixMultiplyKernel / NELocallyConnectedLayer |
| - NEWeightsReshapeKernel / NEConvolutionLayerReshapeWeights |
| |
| v17.05 Public bug fixes release |
| - Various bug fixes |
| - Remaining of the functions ported to use accurate padding. |
| - Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available). |
| - Added "free" method to allocator. |
| - Minimum version of g++ required for armv7 Linux changed from 4.8 to 4.9 |
| |
| v17.04 Public bug fixes release |
| |
| The following functions have been ported to use the new accurate padding: |
| - CLColorConvertKernel |
| - CLEdgeNonMaxSuppressionKernel |
| - CLEdgeTraceKernel |
| - CLGaussianPyramidHorKernel |
| - CLGaussianPyramidVertKernel |
| - CLGradientKernel |
| - NEChannelCombineKernel |
| - NEFillArrayKernel |
| - NEGaussianPyramidHorKernel |
| - NEGaussianPyramidVertKernel |
| - NEHarrisScoreFP16Kernel |
| - NEHarrisScoreKernel |
| - NEHOGDetectorKernel |
| - NELogits1DMaxKernel |
| - NELogits1DShiftExpSumKernel |
| - NELogits1DNormKernel |
| - NENonMaximaSuppression3x3FP16Kernel |
| - NENonMaximaSuppression3x3Kernel |
| |
| v17.03.1 First Major public release of the sources |
| - Renamed the library to arm_compute |
| - New CPP target introduced for C++ kernels shared between Arm® Neon™ and CL functions. |
| - New padding calculation interface introduced and ported most kernels / functions to use it. |
| - New OpenCL kernels / functions: |
| - CLGEMMLowpMatrixMultiplyKernel / CLGEMMLowp |
| - New Arm® Neon™ kernels / functions: |
| - @ref NENormalizationLayerKernel / @ref NENormalizationLayer |
| - NETransposeKernel / @ref NETranspose |
| - NELogits1DMaxKernel, NELogits1DShiftExpSumKernel, NELogits1DNormKernel / @ref NESoftmaxLayer |
| - NEIm2ColKernel, NECol2ImKernel, NEConvolutionLayerWeightsReshapeKernel / @ref NEConvolutionLayer |
| - NEGEMMMatrixAccumulateBiasesKernel / @ref NEFullyConnectedLayer |
| - NEGEMMLowpMatrixMultiplyKernel / NEGEMMLowp |
| |
| v17.03 Sources preview |
| - New OpenCL kernels / functions: |
| - CLGradientKernel, CLEdgeNonMaxSuppressionKernel, CLEdgeTraceKernel / CLCannyEdge |
| - GEMM refactoring + FP16 support: CLGEMMInterleave4x4Kernel, CLGEMMTranspose1xWKernel, CLGEMMMatrixMultiplyKernel, CLGEMMMatrixAdditionKernel / @ref CLGEMM |
| - CLGEMMMatrixAccumulateBiasesKernel / @ref CLFullyConnectedLayer |
| - CLTransposeKernel / @ref CLTranspose |
| - CLLKTrackerInitKernel, CLLKTrackerStage0Kernel, CLLKTrackerStage1Kernel, CLLKTrackerFinalizeKernel / CLOpticalFlow |
| - @ref CLNormalizationLayerKernel / @ref CLNormalizationLayer |
| - CLLaplacianPyramid, CLLaplacianReconstruct |
| - New Arm® Neon™ kernels / functions: |
| - NEActivationLayerKernel / @ref NEActivationLayer |
| - GEMM refactoring + FP16 support (Requires armv8.2 CPU): NEGEMMInterleave4x4Kernel, NEGEMMTranspose1xWKernel, NEGEMMMatrixMultiplyKernel, NEGEMMMatrixAdditionKernel / @ref NEGEMM |
| - NEPoolingLayerKernel / @ref NEPoolingLayer |
| |
| v17.02.1 Sources preview |
| - New OpenCL kernels / functions: |
| - CLLogits1DMaxKernel, CLLogits1DShiftExpSumKernel, CLLogits1DNormKernel / @ref CLSoftmaxLayer |
| - CLPoolingLayerKernel / @ref CLPoolingLayer |
| - CLIm2ColKernel, CLCol2ImKernel, CLConvolutionLayerWeightsReshapeKernel / CLConvolutionLayer |
| - CLRemapKernel / CLRemap |
| - CLGaussianPyramidHorKernel, CLGaussianPyramidVertKernel / CLGaussianPyramid, CLGaussianPyramidHalf, CLGaussianPyramidOrb |
| - CLMinMaxKernel, CLMinMaxLocationKernel / CLMinMaxLocation |
| - CLNonLinearFilterKernel / CLNonLinearFilter |
| - New Arm® Neon™ FP16 kernels (Requires armv8.2 CPU) |
| - NEAccumulateWeightedFP16Kernel |
| - NEBox3x3FP16Kernel |
| - NENonMaximaSuppression3x3FP16Kernel |
| |
| v17.02 Sources preview |
| - New OpenCL kernels / functions: |
| - CLActivationLayerKernel / @ref CLActivationLayer |
| - CLChannelCombineKernel / CLChannelCombine |
| - CLDerivativeKernel / CLChannelExtract |
| - CLFastCornersKernel / CLFastCorners |
| - CLMeanStdDevKernel / CLMeanStdDev |
| - New Arm® Neon™ kernels / functions: |
| - HOG / SVM: NEHOGOrientationBinningKernel, NEHOGBlockNormalizationKernel, NEHOGDetectorKernel, NEHOGNonMaximaSuppressionKernel / NEHOGDescriptor, NEHOGDetector, NEHOGGradient, NEHOGMultiDetection |
| - NENonLinearFilterKernel / NENonLinearFilter |
| - Introduced a CLScheduler to manage the default context and command queue used by the runtime library and create synchronisation events. |
| - Switched all the kernels / functions to use tensors instead of images. |
| - Updated documentation to include instructions to build the library from sources. |
| |
| v16.12 Binary preview release |
| - Original release |
| |
| */ |
| } // namespace arm_compute |