This document describes the process of setting up and running the Arm® Ethos™-U NPU Noise Reduction example.
Use case code is stored in the following directory: source/use_case/noise_reduction.
Instead of replicating a "noisy audio in" and "clean audio out" problem, a simpler version is defined. We use different frequency bands for the audio (22 in the original paper RNNoise: Learning Noise Suppression). It is based on a scale like the "Mel scale" or "Bark scale" and calculates the energies for each band. Using this type of scale, the bands get divided up and the result is based on what is important to the human ear.
When we have a noisy audio clip, the model takes the energy levels of these different bands as input. The model then tries to predict a value (called a gain), to apply to each frequency band. It is expected that applying this gain to each band brings the audio back to what a "clean" audio sample would have been like. It is like a 22-band equalizer, where we quickly adjust the level of each band so that the noise is removed. However, the signal, or speech, still passes through.
In addition to the 22 band values calculated, the input features also include:
This provides 42 feature inputs,
22 + 6 + 6 + 1 + 6 + 1 = 42, and the model produces
22 (gain values) outputs.
Note: The model also has a second output that predicts if speech is occurring in the given sample.
The pre-processing works in a windowed fashion, on 20ms of the audio clip at a time, and the stride is 10ms. So, for example, if we provide one second of audio this gives us
1000ms/10ms = 100 windows of features and, therefore, an input shape of
100x42 to the model. The output shape of the model is then
100x22, representing the gain values to apply to each of the 100 windows.
These output gain values can then be applied to each corresponding window of the noisy audio clip, producing a cleaner output.
For more information please refer to the original paper: A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement
After each inference the output of the model is passed to post-processing code which uses the gain values the model produced to generate audio with the noise removed from it.
For you to verify the outputs of the model after post-processing, you will have manually use an offline script to convert the post-processed outputs into a wav file. This offline script takes a dump file as the input and saves the denoised WAV file to disk. The following is an example of how to call the script from the command line after running the use-case and selecting to dump memory contents.
python scripts/py/rnnoise_dump_extractor.py --dump_file <path_to_dump_file.bin> --output_dir <path_to_output_folder>
The application for this use case has been written to dump the post-processed output to the address pointed to by the CMake parameter
noise_reduction_MEM_DUMP_BASE_ADDR. The default value is set to
The fixed virtual platform supports dumping of memory contents to a file. This can be done by specifying command-line arguments when starting the FVP executable. For example, the argument:
$ FVP_Corstone_SSE-300_Ethos-U55 -a ./bin/ethos-u-noise_reduction.axf \ --dump cpu0=output.bin@Memory:0x80000000,0x100000
Dumps 1 MiB worth of data from address
0x80000000 to the file
The Noise Reduction application uses the memory address specified by
noise_reduction_MEM_DUMP_BASE_ADDR as a buffer to store post-processed results from all inferences. The maximum size of this buffer is set by the parameter
noise_reduction_MEM_DUMP_LEN which defaults to 1 MiB.
Logging information is generated for every inference run performed. Each line corresponds to the post-processed result of that inference being written to a certain location in memory.
INFO - Audio Clip dump header info (20 bytes) written to 0x80000000 INFO - Inference 1/136 INFO - Copied 960 bytes to 0x80000014 ... INFO - Inference 136/136 INFO - Copied 960 bytes to 0x8001fa54
In the preceding output we can see that it starts at the default address of
0x80000000 where some header information is dumped. Then, after the first inference 960 bytes (480 INT16 values) are written to the first address after the dumped header
0x80000014. Each inference afterward will then write another 960 bytes to the next address and so on until all inferences are complete.
When consolidating all inference outputs for an entire audio clip, the application output should report:
INFO - Output memory dump of 130580 bytes written at address 0x80000000
The application output log states that there are 130580 bytes worth of valid data ready to be read from
0x80000000. If the FVP was started with the
--dump option, then the output file is created when the FVP instance exits.
In addition to the already specified build option in the main documentation, keyword spotting use case adds:
noise_reduction_MODEL_TFLITE_PATH - The path to the NN model file in TFLite format. The model is processed and is included in the application axf file. The default value points to one of the delivered set of models. Note that the parameter
ETHOS_U_NPU_ENABLED must be aligned with the chosen model. Therefore:
ETHOS_U_NPU_ENABLEDis set to
1, we assume that the NN model is optimized. The model naturally falls back to the Arm® Cortex®-M CPU if an unoptimized model is supplied.
ETHOS_U_NPU_ENABLEDis set to
0, then we assume that the NN model is unoptimized. In this case, supplying an optimized model results in a runtime error.
noise_reduction_FILE_PATH: The path to the directory containing WAV files, or a path to single WAV file, to be used in the application. The default value points to the
resources/noise_reduction/samples folder containing the delivered set of audio clips.
noise_reduction_AUDIO_RATE: The input data sampling rate. Each audio file from
noise_reduction_FILE_PATH is preprocessed during the build to match the NN model input requirements. The default value is
noise_reduction_AUDIO_MONO: If set to
ON, then the audio data is converted to mono. The default value is
noise_reduction_AUDIO_OFFSET: Begins loading audio data and starts from this specified offset, defined in seconds. The default value is set to
noise_reduction_AUDIO_DURATION: The length of the audio data to be used in the application in seconds. The default is
0, meaning that the whole audio file is used.
noise_reduction_AUDIO_MIN_SAMPLES: Minimum number of samples required by the network model. If the audio clip is shorter than this number, then it is padded with zeros. The default value is
noise_reduction_ACTIVATION_BUF_SZ: The intermediate, or activation, buffer size reserved for the neural network model. By default, it is set to 2MiB.
To ONLY build a
noise_reduction example application, add
-DUSE_CASE_BUILD=noise_reduction (as specified in Building to the
cmake command line).
Note: This section describes the process for configuring the build for
MPS3: SSE-300. To configure a different target platform, please see the Building section.
To only build the
noise_reduction example, create a build directory, and then navigate inside. For example:
mkdir build_noise_reduction && cd build_noise_reduction
On Linux, when providing only the mandatory arguments for CMake configuration, use the following command to build the Noise Reduction application to run on the Ethos-U55 Fast Model:
cmake ../ -DUSE_CASE_BUILD=noise_reduction
To configure a build that can be debugged using Arm DS, we specify the build type as
Debug and use the
Arm Compiler toolchain file:
cmake .. \ -DCMAKE_TOOLCHAIN_FILE=scripts/cmake/toolchains/bare-metal-armclang.cmake \ -DCMAKE_BUILD_TYPE=Debug \ -DUSE_CASE_BUILD=noise_reduction
For more notes, please refer to:
Note: If you are rebuilding with changed parameters values, it is highly advised that you clean the build directory and rerun the CMake command.
If the CMake command is successful, then build the application as follows:
Note: To see compilation and link details, add
The build results are placed under the
build/bin folder. For example:
bin ├── ethos-u-noise_reduction.axf ├── ethos-u-noise_reduction.htm ├── ethos-u-noise_reduction.map ├── images-noise_reduction.txt └── sectors └── noise_reduction ├── dram.bin └── itcm.bin
Based on the preceding output, the files contain the following information:
ethos-u-noise_reduction.axf: The built application binary for the noise reduction use case.
ethos-u-noise_reduction.map: Information from building the application (for example. The libraries used, what was optimized, and location of objects).
ethos-u-noise_reduction.htm: A human readable file containing the call graph of application functions.
sectors/: This folder contains the built application, which is split into files for loading into different FPGA memory regions.
Images-noise_reduction.txt: Tells the FPGA which memory regions to use for loading the binaries in the
To run with inputs different to the ones supplied, the parameter
noise_reduction_FILE_PATH can be pointed to a WAV file, or a directory containing WAV files. Once you have a directory with WAV files, run the following command:
cmake .. \ -DUSE_CASE_BUILD=noise_reduction \ -Dnoise_reduction_FILE_PATH=/path/to/custom/wav_files
The application performs inference using the model pointed to by the CMake parameter
Note: If you want to run the model using Ethos-U ensure that your custom model has been run through the Vela compiler successfully before continuing.
For further information: Optimize model with Vela compiler.
cmake .. \ -Dnoise_reduction_MODEL_TFLITE_PATH=<path/to/custom_model_after_vela.tflite> \ -DUSE_CASE_BUILD=noise_reduction
Note Changing the neural network model often also requires the pre-processing implementation to be changed. Please refer to: How the default neural network model works.
Note: Before re-running the CMake command, clean the build directory.
.tflite model file, which is pointed to by
noise_reduction_MODEL_TFLITE_PATH, is converted to C++ files during the CMake configuration stage. It is then compiled into the application for performing inference with.
To see which model path was used, inspect the configuration stage log:
-- User option noise_reduction_MODEL_TFLITE_PATH is set to <path/to/custom_model_after_vela.tflite> ... -- Using <path/to/custom_model_after_vela.tflite> ++ Converting custom_model_after_vela.tflite to custom_model_after_vela.tflite.cc -- Generating labels file from <path/to/labels_custom_model.txt> -- writing to <path/to/build/generated/src/Labels.cc> ...
After compiling, your custom model replaces the default one in the application.
The FVP is available publicly from Arm Ecosystem FVP downloads.
For the Ethos-U evaluation, please download the MPS3 based version of the Arm® Corstone™-300 model that contains Cortex-M55 and offers a choice of the Ethos-U55 and Ethos-U65 processors.
To install the FVP:
Unpack the archive,
Run the install script in the extracted package:
Once the building step has completed, the application binary
ethos-u-noise_reduction.axf can be found in the
build/bin folder. Assuming the install location of the FVP was set to
~/FVP_install_location, start the simulation with the following command:
A log output then appears on the terminal:
telnetterminal0: Listening for serial connection on port 5000 telnetterminal1: Listening for serial connection on port 5001 telnetterminal2: Listening for serial connection on port 5002 telnetterminal5: Listening for serial connection on port 5003
This also launches a telnet window with the standard output of the sample application. It also includes error log entries containing information about the pre-built application version, TensorFlow Lite Micro library version used, and the data type. As well as the input and output tensor sizes of the model that was compiled into the executable binary.
After the application has started, if
noise_reduction_FILE_PATH pointed to a single file (or a folder containing a single input file), then the inference starts immediately. If multiple inputs are chosen, then a menu is output and waits for the user input from telnet terminal.
User input required Enter option number from: 1. Run noise reduction on the next WAV 2. Run noise reduction on a WAV at chosen index 3. Run noise reduction on all WAVs 4. Show NN model info 5. List audio clips Choice:
“Run noise reduction on the next WAV”: Runs processing and inference on the next in line WAV file.
Note: Depending on the size of the input WAV file, multiple inferences can be invoked.
“Run noise reduction on a WAV at chosen index”: Runs processing and inference on the WAV file corresponding to the chosen index.
Note: Select the index in the range of supplied WAVs during application build. By default, the pre-built application has three files and indexes from 0-2.
“Run noise reduction on all WAVs”: Triggers sequential processing and inference executions on all baked-in WAV files.
“Show NN model info”: Prints information about the model data type, including the input and output tensor sizes. For example:
INFO - Model info: INFO - Model INPUT tensors: INFO - tensor type is INT8 INFO - tensor occupies 42 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 42 INFO - Quant dimension: 0 INFO - Scale = 0.221501 INFO - ZeroPoint = 14 INFO - tensor type is INT8 INFO - tensor occupies 24 bytes with dimensions INFO - 0: 1 INFO - 1: 24 INFO - Quant dimension: 0 INFO - Scale = 0.007843 INFO - ZeroPoint = -1 INFO - tensor type is INT8 INFO - tensor occupies 48 bytes with dimensions INFO - 0: 1 INFO - 1: 48 INFO - Quant dimension: 0 INFO - Scale = 0.047942 INFO - ZeroPoint = -128 INFO - tensor type is INT8 INFO - tensor occupies 96 bytes with dimensions INFO - 0: 1 INFO - 1: 96 INFO - Quant dimension: 0 INFO - Scale = 0.007843 INFO - ZeroPoint = -1 INFO - Model OUTPUT tensors: INFO - tensor type is INT8 INFO - tensor occupies 96 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 96 INFO - Quant dimension: 0 INFO - Scale = 0.007843 INFO - ZeroPoint = -1 INFO - tensor type is INT8 INFO - tensor occupies 22 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 22 INFO - Quant dimension: 0 INFO - Scale = 0.003906 INFO - ZeroPoint = -128 INFO - tensor type is INT8 INFO - tensor occupies 48 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 48 INFO - Quant dimension: 0 INFO - Scale = 0.047942 INFO - ZeroPoint = -128 INFO - tensor type is INT8 INFO - tensor occupies 24 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 24 INFO - Quant dimension: 0 INFO - Scale = 0.007843 INFO - ZeroPoint = -1 INFO - tensor type is INT8 INFO - tensor occupies 1 bytes with dimensions INFO - 0: 1 INFO - 1: 1 INFO - 2: 1 INFO - Quant dimension: 0 INFO - Scale = 0.003906 INFO - ZeroPoint = -128 INFO - Activation buffer (a.k.a tensor arena) size used: 1940 INFO - Number of operators: 1 INFO - Operator 0: ethos-u INFO - Use of Arm uNPU is enabled
“List audio clips”: Prints a list of pair audio indexes. The original filenames are embedded in the application. For example:
INFO - List of Files: INFO - 0 => p232_113.wav INFO - 1 => p232_208.wav INFO - 2 => p257_031.wav
Selecting the first option runs inference on the first file.
The following example illustrates an application output:
INFO - Audio Clip dump header info (20 bytes) written to 0x80000000 INFO - Inference 1/136 INFO - Copied 960 bytes to 0x80000014 INFO - Inference 2/136 INFO - Copied 960 bytes to 0x800003d4 ... INFO - Inference 136/136 INFO - Copied 960 bytes to 0x8001fa54 INFO - Output memory dump of 130580 bytes written at address 0x80000000 INFO - Final results: INFO - Profile for Inference: INFO - NPU AXI0_RD_DATA_BEAT_RECEIVED beats: 530 INFO - NPU AXI0_WR_DATA_BEAT_WRITTEN beats: 376 INFO - NPU AXI1_RD_DATA_BEAT_RECEIVED beats: 13911 INFO - NPU ACTIVE cycles: 103870 INFO - NPU IDLE cycles: 643 INFO - NPU TOTAL cycles: 104514
Note: When running Fast Model, each inference can take several seconds on most systems.
Each inference dumps the post processed output to memory. For further information, please refer to: Dumping post processed results for all inferences.
The profiling section of the log shows that for this inference:
Ethos-U NPU PMU report for each inference:
104514: The total number of NPU cycles.
103870: How many NPU cycles were used for computation.
643: How many cycles the NPU was idle for.
530: The number of AXI beats with read transactions from AXI0 bus.
Note: The AXI0 is the bus where the Ethos-U NPU reads and writes to the computation buffers, or the activation buf or tensor arenas.
370: The number of AXI beats with write transactions to the AXI0 bus.
13911: The number of AXI beats with read transactions from AXI1 bus.
Note: The AXI1 is the bus where Ethos-U NPU reads the model, which is read-only.
For FPGA platforms, the CPU cycle count can also be enabled. However, for FVP, do not use the CPU cycle counters as the CPU model is not cycle-approximate or cycle-accurate.