The ML Inference Advisor (MLIA) helps AI developers design and optimize neural network models for efficient inference on Arm® targets (see supported targets). MLIA provides insights on how the ML model will perform on Arm early in the model development cycle. By passing a model file and specifying an Arm hardware target, users get an overview of possible areas of improvement and actionable advice. The advice can cover operator compatibility, performance analysis and model optimization (e.g. pruning and clustering). With the ML Inference Advisor, we aim to make the Arm ML IP accessible to developers at all levels of abstraction, with differing knowledge on hardware optimization and machine learning.
This product conforms to Arm's inclusive language policy and, to the best of our knowledge, does not contain any non-inclusive language.
If you find something that concerns you, email terms@arm.com.
Release notes can be found in MLIA releases.
In case you need support or want to report an issue, give us feedback or simply ask a question about MLIA, please send an email to mlia@arm.com.
Alternatively, use the AI and ML forum to get support by marking your post with the MLIA tag.
Information on reporting security issues can be found in Reporting vulnerabilities.
ML Inference Advisor is licensed under Apache License 2.0.
It is recommended to use a virtual environment for MLIA installation, and a typical setup requires:
MLIA can be installed with pip
using the following command:
pip install mlia
It is highly recommended to create a new virtual environment for the installation.
After the installation, you can check that MLIA is installed correctly by opening your terminal, activating the virtual environment and typing the following command that should print the help text:
mlia --help
The ML Inference Advisor works with sub-commands, i.e. in general a command would look like this:
mlia [sub-command] [arguments]
Where the following sub-commands are available:
Detailed help about the different sub-commands can be shown like this:
mlia [sub-command] --help
The following sections go into further detail regarding the usage of MLIA.
This section gives an overview of the available sub-commands for MLIA.
Lists the model's operators with information about their compatibility with the specified target.
Examples:
# List operator compatibility with Ethos-U55 with 256 MAC mlia check ~/models/mobilenet_v1_1.0_224_quant.tflite --target-profile ethos-u55-256 # List operator compatibility with Cortex-A mlia check ~/models/mobilenet_v1_1.0_224_quant.tflite --target-profile cortex-a # Get help and further information mlia check --help
Estimates the model's performance on the specified target and prints out statistics.
Examples:
# Use default parameters mlia check ~/models/mobilenet_v1_1.0_224_quant.tflite \ --target-profile ethos-u55-256 \ --performance # Explicitly specify the target profile and backend(s) to use # with --backend option mlia check ~/models/ds_cnn_large_fully_quantized_int8.tflite \ --target-profile ethos-u65-512 \ --performance \ --backend "vela" \ --backend "corstone-300" # Get help and further information mlia check --help
This sub-command applies optimizations to a Keras model (.h5 or SavedModel) or a TensorFlow Lite model and shows the performance improvements compared to the original unoptimized model.
There are currently three optimization techniques available to apply:
More information about these techniques can be found online in the TensorFlow documentation, e.g. in the TensorFlow model optimization guides.
Note: A Keras model (.h5 or SavedModel) is required as input to perform pruning and clustering. A TensorFlow Lite model is required as input to perform a rewrite.
Examples:
# Custom optimization parameters: pruning=0.6, clustering=16 mlia optimize ~/models/ds_cnn_l.h5 \ --target-profile ethos-u55-256 \ --pruning \ --pruning-target 0.6 \ --clustering \ --clustering-target 16 # Get help and further information mlia optimize --help # An example for using rewrite mlia optimize ~/models/ds_cnn_large_fp32.tflite \ --target-profile ethos-u55-256 \ --rewrite \ --dataset input.tfrec \ --rewrite-target fully-connected \ --rewrite-start MobileNet/avg_pool/AvgPool \ --rewrite-end MobileNet/fc1/BiasAdd
Training parameters for rewrites can be specified.
There are a number of predefined profiles:
| Name | Batch Size | LR | Show Progress | Steps | LR Schedule | Num Procs | Num Threads | Checkpoints | Augmentations | | :----------: | :--------: | :--: | :-----------: | :---: | :---------: | :-------: | :---------: | :---------: | :-------------: | | optimization | 32 | 1e-3 | True | 48000 | "cosine" | 1 | 0 | None | "gaussian" |
| Name | Batch Size | LR | Show Progress | Steps | LR Schedule | Num Procs | Num Threads | Checkpoints | Augmentations - gaussian_strength | Augmentations - mixup_strength | | :------------------------------: | :--------: | :--: | :-----------: | :---: | :---------: | :-------: | :---------: | :---------: | :-------------------------------: | :----------------------------: | | optimization_custom_augmentation | 32 | 1e-3 | True | 48000 | "cosine" | 1 | 0 | None | 0.1 | 0.1 |
The augmentations consist of 2 parameters: mixup strength and gaussian strength.
Augmenations can be selected from a number of pre-defined profiles (see the table below) or each individual parameter can be chosen (see optimization_custom_augmentation above for an example):
Name | MixUp Strength | Gaussian Strength |
---|---|---|
"none" | None | None |
"gaussian" | None | 1.0 |
"mixup" | 1.0 | None |
"mixout" | 1.6 | None |
"mix_gaussian_large" | 2.0 | 1.0 |
"mix_gaussian_small" | 1.6 | 0.3 |
##### An example for using optimization Profiles mlia optimize ~/models/ds_cnn_large_fp32.tflite \ --target-profile ethos-u55-256 \ --optimization-profile optimization \ --rewrite \ --dataset input.tfrec \ --rewrite-target fully-connected \ --rewrite-start MobileNet/avg_pool/AvgPool \ --rewrite-end MobileNet/fc1/BiasAdd_
For the custom optimization profiles, the configuration file for a custom optimization profile is passed as path and needs to conform to the TOML file format. Each optimization in MLIA has a pre-defined set of parameters which need to be present in the config file. When using the built-in optimization profiles, the appropriate toml file is copied to mlia-output
and can be used to understand what parameters apply for each optimization.
Example:
# for custom profiles mlia ops --optimization-profile ~/my_custom_optimization_profile.toml
The targets currently supported are described in the sections below. All sub-commands require a target profile as input parameter. That target profile can be either a name of a built-in target profile or a custom file. MLIA saves the target profile that was used for a run in the output directory.
The support of the above sub-commands for different targets is provided via backends that need to be installed separately, see Backend installation section.
There are a number of predefined profiles for Ethos-U with the following attributes:
+--------------------------------------------------------------------+ | Profile name | MAC | System config | Memory mode | +===================================================================== | ethos-u55-256 | 256 | Ethos_U55_High_End_Embedded | Shared_Sram | +--------------------------------------------------------------------- | ethos-u55-128 | 128 | Ethos_U55_High_End_Embedded | Shared_Sram | +--------------------------------------------------------------------- | ethos-u65-512 | 512 | Ethos_U65_High_End | Dedicated_Sram | +--------------------------------------------------------------------- | ethos-u65-256 | 256 | Ethos_U65_High_End | Dedicated_Sram | +--------------------------------------------------------------------+
Example:
mlia check ~/model.tflite --target-profile ethos-u65-512 --performance
Ethos-U is supported by these backends:
The profile cortex-a can be used to get the information about supported operators for Cortex-A CPUs when using the Arm NN TensorFlow Lite Delegate. Please, find more details in the section for the corresponding backend.
The target profile tosa can be used for TOSA compatibility checks of your model. It requires the TOSA Checker backend. Please note that TOSA is currently only available for x86 architecture.
For more information, see TOSA Checker's:
For the custom target profiles, the configuration file for a custom target profile is passed as path and needs to conform to the TOML file format. Each target in MLIA has a pre-defined set of parameters which need to be present in the config file. When using the built-in target profiles, the appropriate toml file is copied to mlia-output
and can be used to understand what parameters apply for each target.
Example:
# for custom profiles mlia ops --target-profile ~/my_custom_profile.toml sample_model.tflite
The ML Inference Advisor is designed to use backends to provide different metrics for different target hardware. Some backends come pre-installed, but others can be added and managed using the command mlia-backend
, that provides the following functionality:
Examples:
# List backends installed and available for installation mlia-backend list # Install Corstone-300 backend for Ethos-U mlia-backend install Corstone-300 --path ~/FVP_Corstone_SSE-300/ # Uninstall the Corstone-300 backend mlia-backend uninstall Corstone-300 # Get help and further information mlia-backend --help
Note: Some, but not all, backends can be automatically downloaded, if no path is provided.
This section lists available backends. As not all backends work on any platform the following table shows some compatibility information:
+----------------------------------------------------------------------------+ | Backend | Linux | Windows | Python | +============================================================================= | Arm NN | | | | | TensorFlow | x86_64 and AArch64 | Windows 10 | Python>=3.8 | | Lite Delegate | | | | +----------------------------------------------------------------------------- | Corstone-300 | x86_64 and AArch64 | Not compatible | Python>=3.8 | +----------------------------------------------------------------------------- | Corstone-310 | x86_64 and AArch64 | Not compatible | Python>=3.8 | +----------------------------------------------------------------------------- | TOSA checker | x86_64 (manylinux2014) | Not compatible | 3.7<=Python<=3.9 | +----------------------------------------------------------------------------- | Vela | x86_64 and AArch64 | Windows 10 | Python~=3.7 | +----------------------------------------------------------------------------+
This backend provides general information about the compatibility of operators with the Arm NN TensorFlow Lite Delegate for Cortex-A. It comes pre-installed.
For version 23.05 the classic delegate is used.
For more information see:
Corstone-300 is a backend that provides performance metrics for systems based on Cortex-M55 and Ethos-U. It is only available on the Linux platform.
Examples:
# Download and install Corstone-300 automatically mlia-backend install Corstone-300 # Point to a local version of Corstone-300 installed using its installation script mlia-backend install Corstone-300 --path YOUR_LOCAL_PATH_TO_CORSTONE_300
For further information about Corstone-300 please refer to: https://developer.arm.com/Processors/Corstone-300
Corstone-310 is a backend that provides performance metrics for systems based on Cortex-M85 and Ethos-U.
The TOSA Checker backend provides operator compatibility checks against the TOSA specification. Please note that TOSA is currently only available for x86 architecture.
Please, install it into the same environment as MLIA using this command:
mlia-backend install tosa-checker
Additional resources:
The Vela backend provides performance metrics for Ethos-U based systems. It comes pre-installed.
Additional resources: