MLBEDSW-3249: Vela config file examples - Added sample vela.ini config file - Changed vela config format, split into system config and memory mode - Removed unused CPU cycle performance estimation - Added new CLI options for --memory-mode and --verbose-config - Changed CLI option --config to take multiple files - Removed CLI option --global-memory-clock-scales - Changed error helper functions to raise a VelaError exception - Refactored to create a new is_spilling_enabled function Signed-off-by: Tim Hall <tim.hall@arm.com> Change-Id: I27c41577e37a3859edb9524cd99784be10ef0a0d

commit: 1bd531dec0b4eb745fb8856d14c1aba2b8a73026 [log] [tgz]
author: Tim Hall <tim.hall@arm.com> Sun Nov 01 20:59:36 2020 +0000
committer: Tim Hall <tim.hall@arm.com> Fri Nov 20 12:55:47 2020 +0000
tree: a0265a0accd2395277fe88be27164d09541abc7f
parent: c8a73868d40cf63380f634baeb51aa7aa993fc0c [diff]
diff --git a/OPTIONS.md b/OPTIONS.md
index f02b91e..baf6c5a 100644
--- a/OPTIONS.md
+++ b/OPTIONS.md

@@ -2,13 +2,13 @@
 
 This file contains a more verbose and detailed description of the Vela
 Compiler's CLI options than the built-in help strings.  It also defines and
-describes the Vela system configuration file format.
+describes Vela's configuration file format.
 
 ## Command Line Interface
 
 ### Network (required)
 
-Filename of the network model to compile. The file has to be a `.tflite` file.  
+Filename of the network model to compile.  The file has to be a `.tflite` file.  
 **Type: POSIX path**  
 **Default: N/A**  
 
@@ -18,7 +18,7 @@
 
 ### Help
 
-Displays the help strings of all CLI options. Can be used without the required
+Displays the help strings of all CLI options.  Can be used without the required
 Network argument.  
 **Type: N/A**  
 **Default: N/A**  
@@ -29,7 +29,7 @@
 
 ### Version
 
-Displays the version of the installed Vela Compiler. Can be used without the
+Displays the version of the installed Vela Compiler.  Can be used without the
 required Network argument.  
 **Type: N/A**  
 **Default: N/A**  
@@ -75,19 +75,21 @@
 
 ### Config
 
-Specifies the path to the config file. The file has to be a `.ini` file. The
-format is described further in a the Config section below.  
+Specifies the path to the Vela configuration file.  The format of the file is a
+Python ConfigParser `.ini` file.  This option can be specified multiple times to
+allow multiple files to be searched for the required system config and memory
+mode.  More details can be found in the Configuration File section below.  
 **Type: POSIX path**  
 **Default: use default configuration**  
 
 ```bash
-vela network.tflite --config custom_config.ini
+vela network.tflite --config my_vela_cfg1.ini --config my_vela_cfg2.ini --system-config My_Sys_Cfg --memory-mode My_Mem_Mode
 ```
 
 ### Cascading
 
-Controls the packing of multiple passes into cascades. This allows for lower
-memory usage. If the network's intermediate feature maps are too large for the
+Controls the packing of multiple passes into cascades.  This allows for lower
+memory usage.  If the network's intermediate feature maps are too large for the
 system's SRAM this optimisation is required.  
 **Type: Boolean**  
 **Default: True**  
@@ -109,7 +111,7 @@
 
 Force a specific block configuration in the format HxWxC, where H, W, and C are
 positive integers specifying height, width, and channels (depth), respectively.
-The default behaviour is Vela searching for an optimal block configuration. An
+The default behaviour is Vela searching for an optimal block configuration.  An
 exception will be raised if the chosen block configuration is incompatible.  
 **Type: String**  
 **Default: N/A**  
@@ -121,7 +123,7 @@
 ### Timing
 
 Measure time taken for different compiler steps, e.g. model reading and
-scheduling. Prints the results to standard out.  
+scheduling.  Prints the results to standard out.  
 **Type: Set True**  
 **Default: False**  
 
@@ -131,9 +133,9 @@
 
 ### Accelerator Configuration
 
-Choose which hardware accelerator configuration to compile for. Format is
+Choose which hardware accelerator configuration to compile for.  Format is
 accelerator name followed by a hyphen, followed by the number of MACs in the
-configuration.
+configuration.  
 **Type: String**  
 **Default: ethos-u55-256**  
 **Choices: [ethos-u55-32, ethos-u55-64, ethos-u55-128, ethos-u55-256]**  
@@ -144,13 +146,24 @@
 
 ### System Config
 
-Selects the system configuration to use as specified in the System Configuration
-File (see section below).  
+Selects the system configuration to use as specified in the Vela configuration
+file (see section below).  
 **Type: String**  
 **Default: Use internal default config**  
 
 ```bash
-vela network.tflite --system-config MySysConfig
+vela network.tflite --config my_vela_cfg.ini --system-config My_Sys_Cfg
+```
+
+### Memory Mode
+
+Selects the memory mode to use as specified in the Vela configuration file (see
+section below).  
+**Type: String**  
+**Default: Use internal default config**  
+
+```bash
+vela network.tflite --config my_vela_cfg.ini --memory-mode My_Mem_Mode
 ```
 
 ### Tensor Allocator
@@ -167,9 +180,9 @@
 
 ### Ifm Streaming
 
-Controls scheduler IFM streaming search. Vela's scheduler will choose between
-IFM Streaming and Weight Streaming for optimal memory usage. Disabling this will
-cause Vela to always choose Weight Streaming.  
+Controls scheduler IFM streaming search.  Vela's scheduler will choose between
+IFM Streaming and Weight Streaming for optimal memory usage.  Disabling this
+will cause Vela to always choose Weight Streaming.  
 **Type: Boolean**  
 **Default: True**  
 
@@ -179,8 +192,8 @@
 
 ### Block Config Limit
 
-Limit the block config search space. This will result in faster compilation
-times but may impact the performance of the output network. Use 0 for unlimited
+Limit the block config search space.  This will result in faster compilation
+times but may impact the performance of the output network.  Use 0 for unlimited
 search.  
 **Type: Integer**  
 **Default: 16**  
@@ -190,22 +203,10 @@
 vela network.tflite --block-config-limit 0
 ```
 
-### Global Memory Clock Scale
-
-Performs an additional scaling of the individual memory clock scales specified
-by the system configuration. Used to globally adjust the bandwidth of the
-various memories  
-**Type: Float**  
-**Default: 1.0**  
-
-```bash
-vela network.tflite --global-memory-clock-scale 1.5
-```
-
 ### Pareto Metric
 
-Controls the calculation of the pareto metric. Use 'BwCycMemBlkH' to consider
-Block Height in addition to Bandwidth, Cycle count and Memory. This can reduce
+Controls the calculation of the pareto metric.  Use 'BwCycMemBlkH' to consider
+Block Height in addition to Bandwidth, Cycle count and Memory.  This can reduce
 SRAM usage in some circumstances.  
 **Type: String**  
 **Default: BwCycMem**  
@@ -218,9 +219,9 @@
 ### Recursion Limit
 
 Some of Vela's algorithms use recursion and the required depth can be network
-dependant. This option allows the limit to be increased if needed. The maximum
-limit is platform dependent. If limit is set too low then compilation will raise
-a RecursionError exception.  
+dependant.  This option allows the limit to be increased if needed.  The maximum
+limit is platform dependent.  If limit is set too low then compilation will
+raise a RecursionError exception.  
 **Type: Integer**  
 **Default: 10000**  
 
@@ -244,7 +245,7 @@
 ### Max Block Dependency
 
 Set the maximum value that can be used for the block dependency delay between
-NPU kernel operations. A lower value may result in longer execution time.  
+NPU kernel operations.  A lower value may result in longer execution time.  
 **Type: Integer**  
 **Default: 3**  
 **Choices: [0, 1, 2, 3]**  
@@ -255,8 +256,9 @@
 
 ### Tensor Format Between Cascaded Passes
 
-Controls if NHCWB16 or NHWC Tensor format should be used in between cascaded passes. NHWCB16 means FeatureMaps are laid
-out in 1x1x16B bricks in row-major order. This enables more efficient FeatureMap reading from external memory.  
+Controls if NHCWB16 or NHWC Tensor format should be used in between cascaded
+passes.  NHWCB16 means FeatureMaps are laid out in 1x1x16B bricks in row-major
+order.  This enables more efficient FeatureMap reading from external memory.  
 **Type: Boolean**  
 **Default: True**  
 **Choices: [True, False]**  
@@ -267,9 +269,10 @@
 
 ### Scaling of weight estimates
 
-Performs an additional scaling of weight compression estimate used by Vela to estimate SRAM usage.
-Increasing this scaling factor will make the estimates more conservative (lower) and this can result
-in optimisations that use less SRAM, albeit at the cost of performance (inference speed).  
+Performs an additional scaling of weight compression estimate used by Vela to
+estimate SRAM usage.  Increasing this scaling factor will make the estimates
+more conservative (lower) and this can result in optimisations that use less
+SRAM, albeit at the cost of performance (inference speed).  
 **Type: Float**  
 **Default: 1.0**  
 
@@ -279,8 +282,9 @@
 
 ### Allocation alignment
 
-Controls the allocation byte alignment. Only affects CPU tensors, NPU tensors will remain 16-byte
-aligned independent of this option. Alignment has to be a power of two and greater or equal to 16.  
+Controls the allocation byte alignment.  Only affects CPU tensors, NPU tensors
+will remain 16-byte aligned independent of this option.  Alignment has to be a
+power of two and greater or equal to 16.  
 **Type: Integer**  
 **Default: 16**  
 
@@ -317,6 +321,16 @@
 vela network.tflite --show-cpu-operations
 ```
 
+### Verbose Config
+
+Verbose system configuration and memory mode.  If no `--system-config` or
+`--memory-mode` CLI options are specified then the `internal-default` values
+will be displayed.  
+
+```bash
+vela network.tflite --verbose-config
+```
+
 ### Verbose Graph
 
 Verbose graph rewriter.  
@@ -405,62 +419,79 @@
 vela network.tflite --verbose-operators
 ```
 
-## System Configuration File
+## Configuration File
 
-This is used to describe various properties of the embedded system that the
-network will run in. The configuration file is selected with the `--config` CLI
-option. The system config is selected by Name (defined in the
-`[SysConfig.Name]` field) with the CLI option `--system-config`. The `cpu=X`
-attribute in the `[SysConfig.Name]` is used to cross-reference and select CPU
-operator attributes in the `[CpuPerformance.OpName]` section.  
-Example usage based on the file below:  
+This is used to describe various properties of the Ethos-U embedded system.  The
+configuration file is selected using the `--config` CLI option along with a file
+that describes the properties.  The format of the file is a Python ConfigParser
+`.ini` file format consists of sections used to identify a configuration, and
+key/value pair options used to specify the properties.  All sections and
+key/value pairs are case-sensitive.
+
+There are two types of section, system configuration `[System_Config.*]`
+sections and memory mode `[Memory_Mode.*]` sections.  A complete Ethos-U
+embedded system should define at least one entry in each section, where an entry
+is identified using the format `[Part.Name]` (Part = {System_Config or
+Memory_Mode}, Name = {a string with no spaces}.).  A configuration file may
+contain multiple entries per section, with the entries `.Name` being used to
+select it using the `--system-config` and `--memory-mode` CLI options.  If the
+CLI options are not specified then the sections named `internal-default` are
+used.  These are special sections which are defined internally and contain
+default values.
+
+Each section contains a number of options which are described in more detail
+below.  All options are optional.  If they are not specified, then they will be
+assigned a value of 1 (or the equivalent).  They will not be assigned the value
+of `internal-default`.
+
+One special option is the `inherit` option.  This can be used in any section and
+its value is the name of another section to inherit options from.  The only
+restriction on this option is that recursion is not allowed and so it cannot
+reference its own section.
+
+To see the configuration values being used by Vela use the `--verbose_config`
+CLI option.  This can also be used to display the internal-default values and to
+see a full list of all the available options.
+
+An example Vela configuration file, called `vela_cfg.ini`, is included in the
+directory containing this file.  Example usage based on this file is:  
 
 ```bash
-vela network.tflite --config sys_cfg_vela.ini --system-config MySysConfig
+vela network.tflite --accelerator-config ethos-u55-256 --config vela_cfg.ini --system-config Ethos_U55_High_End_Embedded --memory-mode Shared_Sram
 ```
 
-Example of a Vela system configuration file.  
+The following is an in-line explanation of the Vela configuration file format:
 
 ```ini
-; File: sys_cfg_vela.ini
-; The file contains two parts; a system config part and a CPU operator
-; performance part.
+; file: my_vela_cfg.ini
+; -----------------------------------------------------------------------------
+; Vela configuration file
 
-; System config
-; Specifies properties such as the core clock speed, the size and speed of the
-; four potential memory areas, and for various types of data which memory area
-; is used to store them. The cpu property is used to link with the CPU operator
-; performance.
-; The four potential memory areas are: Sram, Dram, OnChipFlash, OffChipFlash.
+; -----------------------------------------------------------------------------
+; System Configuration
 
-[SysConfig.MySysConfig]
-npu_freq=500e6
-cpu=MyCpu
-Sram_clock_scale=1
-Sram_port_width=64
-Dram_clock_scale=1
-Dram_port_width=64
-OnChipFlash_clock_scale=1
-OnChipFlash_port_width=64
-OffChipFlash_clock_scale=0.25
-OffChipFlash_port_width=32
-permanent_storage_mem_area=OffChipFlash
-feature_map_storage_mem_area=Sram
-fast_storage_mem_area=Sram
+; My_Sys_Cfg
+[System_Config.My_Sys_Cfg]
+core_clock=???               ---> Clock frequency of the Ethos-U.  ??? = {float in Hz} 
+axi0_port=???                ---> Memory type connected to AXI0.  ??? = {Sram, Dram, OnChipFlash or OffChipFlash}
+axi1_port=???                ---> Memory type connected to AXI1.  ??? = {Sram, Dram, OnChipFlash or OffChipFlash}
+Sram_clock_scale=???         ---> Scaling of core_clock to specify the Sram bandwidth.  Only required if selected by an AXI port.  ??? = {float 0.0 to 1.0}
+Dram_clock_scale=???         ---> Scaling of core_clock to specify the Dram bandwidth.  Only required if selected by an AXI port.  ??? = {float 0.0 to 1.0}
+OnChipFlash_clock_scale=???  ---> Scaling of core_clock to specify the OnChipFlash bandwidth.  Only required if selected by an AXI port.  ??? = {float 0.0 to 1.0}
+OffChipFlash_clock_scale=??? ---> Scaling of core_clock to specify the OffChipFlash bandwidth.  Only required if selected by an AXI port.  ??? = {float 0.0 to 1.0}
 
-; CPU operator performance
-; Specifies properties that are used by a linear model to estimate the
-; performance for any operations that will be run on the CPU (such as those not
-; supported by the NPU). Setting the intercept and slope to 0 will result in
-; the operator being excluded from the performance estimation. This is the same
-; as not specifying the operator. If an explicit cpu is specified rather than
-; using the default then the cpu name must match the cpu specified in the
-; SysConfig.<system config name> section.
+; -----------------------------------------------------------------------------
+; Memory Mode
 
-[CpuPerformance.MyCpuOperator]
-default.intercept=0.0
-default.slope=1.0
+; My_Mem_Mode_Parent
+[Memory_Mode.My_Mem_Mode_Parent]
+const_mem_area=???          ---> AXI port used by the read-only data (e.g. weight tensors, scale & bias tensors).  ??? = {Axi0, Axi1}
+arena_mem_area=???          ---> AXI port used by the read-write data (e.g. feature map tensors, internal buffers).  ??? = {Axi0, Axi1}
+cache_mem_area=???          ---> AXI port used by the dedicated SRAM read-write (e.g. feature map part-tensors, internal buffers).  ??? = {Axi0, Axi1}
+cache_sram_size=???         ---> Size of the dedicated cache SRAM.  Only required when cache_mem_area != arena_mem_area.  ??? = {int in Bytes}
 
-MyCpu.intercept=0.0
-MyCpu.slope=1.0
+; My_Mem_Mode_Child
+[Memory_Mode.My_Mem_Mode_Child]
+inherit=???                 ---> Parent section to inherit from.  An option in the child overwrites an identical option in the parent.  ??? = {[Part.Name]}
+cache_sram_size=???         ---> Size of the dedicated cache SRAM.  Only required when cache_mem_area != arena_mem_area.  ??? = {int in Bytes}
 ```

diff --git a/README.md b/README.md
index 84624bf..cdc065a 100644
--- a/README.md
+++ b/README.md

@@ -117,7 +117,7 @@
 ## Running
 
 Vela is run with an input `.tflite` file passed on the command line.  This file
-contains the neural network to be compiled. The tool then outputs an optimised
+contains the neural network to be compiled.  The tool then outputs an optimised
 version with a `_vela.tflite` file prefix, along with the performance estimate
 (EXPERIMENTAL) CSV files, all to the output directory.
 
@@ -133,30 +133,36 @@
 
 Example usage:
 
-1) Compile the network `my_model.tflite`. The optimised version will be output
+1) Compile the network `my_model.tflite`.  The optimised version will be output
 to `./output/my_network_vela.tflite`.
 
 ```bash
 vela my_model.tflite
 ```
 
-1) Compile the network `/path/to/my_model.tflite` and specify the output to go
+2) Compile the network `/path/to/my_model.tflite` and specify the output to go
 in the directory `./results_dir/`.
 
 ```bash
 vela --output-dir ./results_dir /path/to/my_model.tflite
 ```
 
-1) To specify information about the embedded system's configuration use Vela's
-system configuration file. The following command selects the `MySysConfig`
-settings that are described in the `sys_cfg_vela.ini` system configuration file.
-More details can be found in the next section.
+3) Compile a network using a particular Ethos-U NPU.  The following command
+selects an Ethos-U65 NPU accelerator configured with 512 MAC units.
 
 ```bash
-vela --config sys_cfg_vela.ini --system-config MySysConfig my_model.tflite
+vela --accelerator-config ethos-u65-512 my_model.tflite
 ```
 
-1) To get a list of all available options:
+4) Compile a network using a particular embedded system configuration defined in
+Vela's configuration file.  The following command selects the `My_Sys_Config`
+system configuration along with the `My_Mem_Mode` memory mode from the `vela_cfg.ini` configuration file.
+
+```bash
+vela --config vela_cfg.ini --system-config My_Sys_Config --memory-mode My_Mem_Mode my_model.tflite
+```
+
+5) To get a list of all available options:
 
 ```bash
 vela --help

diff --git a/RELEASES.md b/RELEASES.md
index bf6e679..c88c033 100644
--- a/RELEASES.md
+++ b/RELEASES.md

@@ -5,6 +5,21 @@
 fixed.  The version numbering adheres to the
 [semantic versioning](https://semver.org/) scheme.
 
+## Release 2.0.0
+
+**Main feature changes:**
+
+* New Vela configuration file format
+
+**Interface changes:**
+
+* Non-backwards compatible changes to the Vela configuration file
+* Addition of CLI options: `--memory-mode`
+
+**Reported defect fixes:**
+
+* Vela config file examples (MLCE-277)
+
 ## Release 1.2.0 - 31/08/2020
 
 **Main feature changes:**

diff --git a/SECURITY.md b/SECURITY.md
index d1aa6b4..c795d52 100644
--- a/SECURITY.md
+++ b/SECURITY.md

@@ -7,24 +7,25 @@
 
 Arm takes security issues seriously and welcomes feedback from researchers
 and the security community in order to improve the security of its products
-and services. We operate a coordinated disclosure policy for disclosing
+and services.  We operate a coordinated disclosure policy for disclosing
 vulnerabilities and other security issues.
 
 Security issues can be complex and one single timescale doesn't fit all
-circumstances. We will make best endeavours to inform you when we expect
+circumstances.  We will make best endeavours to inform you when we expect
 security notifications and fixes to be available and facilitate coordinated
 disclosure when notifications and patches/mitigations are available.
 
 ### Report
 
-For all security issues, contact Arm by email at [arm-security@arm.com](mailto:arm-security@arm.com).
-In the body of the email include as much information as possible about the issue
-or vulnerability and any additional contact details.
+For all security issues, contact Arm by email at
+[arm-security@arm.com](mailto:arm-security@arm.com).  In the body of the email
+include as much information as possible about the issue or vulnerability and any
+additional contact details.
 
 ### Secure submission using PGP
 
 We support and encourage secure submission of vulnerability reports using PGP,
-using the key below. If you would like replies to be encrypted, please provide
+using the key below.  If you would like replies to be encrypted, please provide
 your own public key through a secure mechanism.
 
 ~~~none
@@ -80,4 +81,5 @@
 -----END PGP PUBLIC KEY BLOCK-----
 ~~~
 
-For more information visit <https://developer.arm.com/support/arm-security-updates/report-security-vulnerabilities>
\ No newline at end of file
+For more information visit
+<https://developer.arm.com/support/arm-security-updates/report-security-vulnerabilities>
\ No newline at end of file

diff --git a/ethosu/vela/architecture_features.py b/ethosu/vela/architecture_features.py
index 9ca4304..7b6c3be 100644
--- a/ethosu/vela/architecture_features.py
+++ b/ethosu/vela/architecture_features.py

@@ -21,7 +21,8 @@
 
 import numpy as np
 
-from .errors import OptionError
+from .errors import CliOptionError
+from .errors import ConfigOptionError
 from .ethos_u55_regs.ethos_u55_regs import resampling_mode
 from .numeric_util import full_shape
 from .numeric_util import round_up
@@ -131,6 +132,12 @@
         return [e.value for e in cls]
 
 
+@enum.unique
+class MemPort(enum.Enum):
+    Axi0 = enum.auto()
+    Axi1 = enum.auto()
+
+
 class ArchitectureFeatures:
     """This class is a container for various parameters of the Ethos-U core
     and system configuration that can be tuned, either by command line
@@ -169,26 +176,29 @@
     OFMSplitDepth = 16
     SubKernelMax = Block(8, 8, 65536)
 
+    DEFAULT_CONFIG = "internal-default"
+
     def __init__(
         self,
-        vela_config: ConfigParser,
+        vela_config_files,
         accelerator_config,
         system_config,
+        memory_mode,
         override_block_config,
         block_config_limit,
-        global_memory_clock_scale,
         max_blockdep,
         weight_estimation_scaling,
+        verbose_config,
     ):
         accelerator_config = accelerator_config.lower()
-        self.vela_config = vela_config
         if accelerator_config not in Accelerator.member_list():
-            raise OptionError("--accelerator-config", self.accelerator_config, "Unknown accelerator configuration")
+            raise CliOptionError("--accelerator-config", self.accelerator_config, "Unknown accelerator configuration")
         self.accelerator_config = Accelerator(accelerator_config)
         accel_config = ArchitectureFeatures.accelerator_configs[self.accelerator_config]
         self.config = accel_config
 
         self.system_config = system_config
+        self.memory_mode = memory_mode
         self.is_ethos_u65_system = self.accelerator_config in (Accelerator.Ethos_U65_256, Accelerator.Ethos_U65_512)
 
         self.max_outstanding_dma = 2 if self.is_ethos_u65_system else 1
@@ -201,14 +211,6 @@
         self.override_block_config = override_block_config
         self.block_config_limit = block_config_limit
 
-        self.global_memory_clock_scale = global_memory_clock_scale
-        if self.global_memory_clock_scale <= 0.0 or self.global_memory_clock_scale > 1.0:
-            raise Exception(
-                "Invalid global_memory_clock_scale = "
-                + str(self.global_memory_clock_scale)
-                + " (must be > 0.0 and <= 1.0)"
-            )
-
         self.max_blockdep = max_blockdep
         self.weight_estimation_scaling = weight_estimation_scaling
 
@@ -220,20 +222,13 @@
         self.num_elem_wise_units = accel_config.elem_units
         self.num_macs_per_cycle = dpu_min_height * dpu_min_width * dpu_dot_product_width * dpu_min_ofm_channels
 
-        self.memory_clock_scales = np.zeros(MemArea.Size)
-        self.memory_port_widths = np.zeros(MemArea.Size)
+        # Get system configuration and memory mode
+        self._get_vela_config(vela_config_files, verbose_config)
 
-        # Get system configuration
-        self.__read_sys_config(self.is_ethos_u65_system)
+        self.axi_port_width = 128 if self.is_ethos_u65_system else 64
+        self.memory_bandwidths_per_cycle = self.axi_port_width * self.memory_clock_scales / 8
 
-        # apply the global memory clock scales to the individual ones from the system config
-        for mem in MemArea.all():
-            self.memory_clock_scales[mem] *= self.global_memory_clock_scale
-
-        self.memory_clocks = self.memory_clock_scales * self.npu_clock
-        self.memory_bandwidths_per_cycle = self.memory_port_widths * self.memory_clock_scales / 8
-
-        self.memory_bandwidths_per_second = self.memory_bandwidths_per_cycle * self.npu_clock
+        self.memory_bandwidths_per_second = self.memory_bandwidths_per_cycle * self.core_clock
 
         # Get output/activation performance numbers
         self._generate_output_perf_tables(self.accelerator_config)
@@ -303,7 +298,7 @@
         self.cycles_weight = 40
         self.max_sram_used_weight = 1000
 
-        if self.is_ethos_u65_system and (self.fast_storage_mem_area != self.feature_map_storage_mem_area):
+        if self.is_spilling_enabled():
             self.max_sram_used_weight = 0
 
         # Shared Buffer Block allocations
@@ -582,100 +577,226 @@
 
         return blockdep
 
-    def cpu_cycle_estimate(self, op):
+    def is_spilling_enabled(self):
         """
-        Gets estimated performance of a CPU operation, based on a linear model of intercept, slope,
-        specified in the vela config file, in ConfigParser file format (.ini file).
-        Example configuration snippet:
-        [CpuPerformance.MyOperationType]
-        Cortex-Mx.intercept=<some float value>
-        Cortex-Mx.slope=<some float value>
+        Spilling is a feature that allows the Ethos-U to use a dedicated SRAM as a cache for various types of data
         """
-        section = "CpuPerformance." + op.type.name
-        if self.vela_config is not None and section in self.vela_config:
-            op_config = self.vela_config[section]
-            try:
-                intercept = float(op_config.get(self.cpu_config + ".intercept", op_config["default.intercept"]))
-                slope = float(op_config.get(self.cpu_config + ".slope", op_config["default.slope"]))
-                n_elements = op.inputs[0].elements()
-                cycles = intercept + n_elements * slope
-                return cycles
-            except Exception:
-                print("Error: Reading CPU cycle estimate in vela configuration file, section {}".format(section))
-                raise
+        return (
+            self._mem_port_mapping(self.cache_mem_area) == MemArea.Sram and self.cache_mem_area != self.arena_mem_area
+        )
 
-        print("Warning: No configured CPU performance estimate for", op.type)
-        return 0
+    def _mem_port_mapping(self, mem_port):
+        mem_port_mapping = {MemPort.Axi0: self.axi0_port, MemPort.Axi1: self.axi1_port}
+        return mem_port_mapping[mem_port]
 
-    def __read_sys_config(self, is_ethos_u65_system):
-        """
-        Gets the system configuration with the given name from the vela configuration file
-        Example configuration snippet:
-        [SysConfig.MyConfigName]
-        npu_freq=<some float value>
-        cpu=Cortex-Mx
-        ...
-        """
-        # Get system configuration from the vela configuration file
-        if self.vela_config is None:
-            print("Warning: Using default values for system configuration")
+    def _set_default_sys_config(self):
+        print(f"Warning: Using {ArchitectureFeatures.DEFAULT_CONFIG} values for system configuration")
+        # ArchitectureFeatures.DEFAULT_CONFIG values
+        if self.is_ethos_u65_system:
+            # Default Ethos-U65 system configuration
+            # Ethos-U65 Client-Server: SRAM (16 GB/s) and DRAM (12 GB/s)
+            self.core_clock = 1e9
+            self.axi0_port = MemArea.Sram
+            self.axi1_port = MemArea.Dram
+            self.memory_clock_scales[MemArea.Sram] = 1.0
+            self.memory_clock_scales[MemArea.Dram] = 0.75  # 3 / 4
         else:
-            section_key = "SysConfig." + self.system_config
-            if section_key not in self.vela_config:
-                raise OptionError("--system-config", self.system_config, "Unknown system configuration")
+            # Default Ethos-U55 system configuration
+            # Ethos-U55 High-End Embedded: SRAM (4 GB/s) and Flash (0.5 GB/s)
+            self.core_clock = 500e6
+            self.axi0_port = MemArea.Sram
+            self.axi1_port = MemArea.OffChipFlash
+            self.memory_clock_scales[MemArea.Sram] = 1.0
+            self.memory_clock_scales[MemArea.OffChipFlash] = 0.125  # 1 / 8
 
-        try:
-            self.npu_clock = float(self.__sys_config("npu_freq", "500e6"))
-            self.cpu_config = self.__sys_config("cpu", "Cortex-M7")
+    def _set_default_mem_mode(self):
+        print(f"Warning: Using {ArchitectureFeatures.DEFAULT_CONFIG} values for memory mode")
+        # ArchitectureFeatures.DEFAULT_CONFIG values
+        if self.is_ethos_u65_system:
+            # Default Ethos-U65 memory mode
+            # Dedicated SRAM: SRAM is only used by the Ethos-U
+            self.const_mem_area = MemPort.Axi1
+            self.arena_mem_area = MemPort.Axi1
+            self.cache_mem_area = MemPort.Axi0
+            self.cache_sram_size = 384 * 1024
+        else:
+            # Default Ethos-U65 memory mode
+            self.const_mem_area = MemPort.Axi1
+            self.arena_mem_area = MemPort.Axi0
+            self.cache_mem_area = MemPort.Axi0
 
-            self.memory_clock_scales[MemArea.Sram] = float(self.__sys_config("Sram_clock_scale", "1"))
-            self.memory_port_widths[MemArea.Sram] = int(self.__sys_config("Sram_port_width", "64"))
+    def _get_vela_config(self, vela_config_files, verbose_config):
+        """
+        Gets the system configuration and memory modes from one or more Vela configuration file(s) or uses some
+        defaults.
+        """
 
-            self.memory_clock_scales[MemArea.OnChipFlash] = float(self.__sys_config("OnChipFlash_clock_scale", "1"))
-            self.memory_port_widths[MemArea.OnChipFlash] = int(self.__sys_config("OnChipFlash_port_width", "64"))
+        # all properties are optional and are initialised to a value of 1 (or the equivalent)
+        self.core_clock = 1
+        self.axi0_port = MemArea(1)
+        self.axi1_port = MemArea(1)
+        self.memory_clock_scales = np.ones(MemArea.Size)
+        self.const_mem_area = MemPort(1)
+        self.arena_mem_area = MemPort(1)
+        self.cache_mem_area = MemPort(1)
+        self.cache_sram_size = 1
 
-            self.memory_clock_scales[MemArea.OffChipFlash] = float(
-                self.__sys_config("OffChipFlash_clock_scale", "0.25")
+        # read configuration file(s)
+        self.vela_config = None
+
+        if vela_config_files is not None:
+            self.vela_config = ConfigParser()
+            self.vela_config.read(vela_config_files)
+
+        # read system configuration
+        sys_cfg_section = "System_Config." + self.system_config
+
+        if self.vela_config is not None and self.vela_config.has_section(sys_cfg_section):
+            self.core_clock = float(self._read_config(sys_cfg_section, "core_clock", self.core_clock))
+            self.axi0_port = MemArea[self._read_config(sys_cfg_section, "axi0_port", self.axi0_port)]
+            self.axi1_port = MemArea[self._read_config(sys_cfg_section, "axi1_port", self.axi1_port)]
+
+            for mem_area in (self.axi0_port, self.axi1_port):
+                self.memory_clock_scales[mem_area] = float(
+                    self._read_config(
+                        sys_cfg_section, mem_area.name + "_clock_scale", self.memory_clock_scales[mem_area]
+                    )
+                )
+
+        elif self.system_config == ArchitectureFeatures.DEFAULT_CONFIG:
+            self._set_default_sys_config()
+
+        elif vela_config_files is None:
+            raise CliOptionError("--config", vela_config_files, "CLI Option not specified")
+
+        else:
+            raise CliOptionError(
+                "--system-config",
+                self.system_config,
+                "Section {} not found in Vela config file".format(sys_cfg_section),
             )
-            self.memory_port_widths[MemArea.OffChipFlash] = int(self.__sys_config("OffChipFlash_port_width", "32"))
 
-            self.memory_clock_scales[MemArea.Dram] = float(self.__sys_config("Dram_clock_scale", "1"))
-            self.memory_port_widths[MemArea.Dram] = int(self.__sys_config("Dram_port_width", "32"))
+        # read the memory mode
+        mem_mode_section = "Memory_Mode." + self.memory_mode
 
-            self.fast_storage_mem_area = MemArea[self.__sys_config("fast_storage_mem_area", "Sram")]
-            self.feature_map_storage_mem_area = MemArea[self.__sys_config("feature_map_storage_mem_area", "Sram")]
+        if self.vela_config is not None and self.vela_config.has_section(mem_mode_section):
+            self.const_mem_area = MemPort[
+                self._read_config(mem_mode_section, "const_mem_area", self.const_mem_area.name)
+            ]
+            self.arena_mem_area = MemPort[
+                self._read_config(mem_mode_section, "arena_mem_area", self.arena_mem_area.name)
+            ]
+            self.cache_mem_area = MemPort[
+                self._read_config(mem_mode_section, "cache_mem_area", self.cache_mem_area.name)
+            ]
+            self.cache_sram_size = int(self._read_config(mem_mode_section, "cache_sram_size", self.cache_sram_size))
 
-            self.permanent_storage_mem_area = MemArea[self.__sys_config("permanent_storage_mem_area", "OffChipFlash")]
-            if is_ethos_u65_system:
-                if self.permanent_storage_mem_area is not MemArea.Dram:
-                    raise Exception(
-                        "Invalid permanent_storage_mem_area = "
-                        + str(self.permanent_storage_mem_area)
-                        + " (must be 'DRAM' for Ethos-U65)."
-                    )
-            else:
-                if self.permanent_storage_mem_area not in set((MemArea.OnChipFlash, MemArea.OffChipFlash)):
-                    raise Exception(
-                        "Invalid permanent_storage_mem_area = "
-                        + str(self.permanent_storage_mem_area)
-                        + " (must be 'OnChipFlash' or 'OffChipFlash' for Ethos-U55)."
-                        " To store the weights and other constant data in SRAM on Ethos-U55 select 'OnChipFlash'"
-                    )
+        elif self.memory_mode == ArchitectureFeatures.DEFAULT_CONFIG:
+            self._set_default_mem_mode()
 
-            self.sram_size = 1024 * int(self.__sys_config("sram_size_kb", "204800"))
+        elif vela_config_files is None:
+            raise CliOptionError("--config", vela_config_files, "CLI Option not specified")
 
-        except Exception:
-            print("Error: Reading System Configuration in vela configuration file, section {}".format(section_key))
-            raise
+        else:
+            raise CliOptionError(
+                "--memory-mode", self.memory_mode, "Section {} not found in Vela config file".format(mem_mode_section),
+            )
 
-    def __sys_config(self, key, default_value):
+        # override sram to onchipflash
+        if self._mem_port_mapping(self.const_mem_area) == MemArea.Sram:
+            if self.const_mem_area == self.arena_mem_area == self.cache_mem_area:
+                print(
+                    "Info: Changing const_mem_area from Sram to OnChipFlash. This will use the same characteristics as"
+                    " Sram."
+                )
+                if self.const_mem_area == MemPort.Axi0:
+                    self.const_mem_area = MemPort.Axi1
+                    self.axi1_port = MemArea.OnChipFlash
+                else:
+                    self.const_mem_area = MemPort.Axi0
+                    self.axi0_port = MemArea.OnChipFlash
+                self.memory_clock_scales[MemArea.OnChipFlash] = self.memory_clock_scales[MemArea.Sram]
+
+        # check configuration
+        if self._mem_port_mapping(self.cache_mem_area) != MemArea.Sram:
+            raise ConfigOptionError("cache_mem_area", self._mem_port_mapping(self.cache_mem_area).name, "Sram")
+
+        if self.is_ethos_u65_system:
+            if self._mem_port_mapping(self.const_mem_area) not in (
+                MemArea.Dram,
+                MemArea.OnChipFlash,
+                MemArea.OffChipFlash,
+            ):
+                raise ConfigOptionError(
+                    "const_mem_area",
+                    self._mem_port_mapping(self.const_mem_area).name,
+                    "Dram or OnChipFlash or OffChipFlash",
+                )
+
+            if self._mem_port_mapping(self.arena_mem_area) not in (MemArea.Sram, MemArea.Dram):
+                raise ConfigOptionError(
+                    "arena_mem_area", self._mem_port_mapping(self.arena_mem_area).name, "Sram or Dram"
+                )
+        else:
+            if self._mem_port_mapping(self.const_mem_area) not in (MemArea.OnChipFlash, MemArea.OffChipFlash):
+                raise ConfigOptionError(
+                    "const_mem_area", self._mem_port_mapping(self.const_mem_area).name, "OnChipFlash or OffChipFlash"
+                )
+
+            if self._mem_port_mapping(self.arena_mem_area) != MemArea.Sram:
+                raise ConfigOptionError("arena_mem_area", self._mem_port_mapping(self.arena_mem_area).name, "Sram")
+
+        # assign existing memory areas
+        self.permanent_storage_mem_area = self._mem_port_mapping(self.const_mem_area)
+        self.feature_map_storage_mem_area = self._mem_port_mapping(self.arena_mem_area)
+        self.fast_storage_mem_area = self._mem_port_mapping(self.cache_mem_area)
+
+        self.sram_size = self.cache_sram_size if self.is_spilling_enabled() else 9999 * 1024 * 1024
+
+        # display the system configuration and memory mode
+        if verbose_config:
+            print(f"System Configuration ({self.system_config}):")
+            print(f"   core_clock = {self.core_clock}")
+            print(f"   axi0_port = {self.axi0_port.name}")
+            print(f"   axi1_port = {self.axi1_port.name}")
+            for mem in (MemArea.Sram, MemArea.Dram, MemArea.OnChipFlash, MemArea.OffChipFlash):
+                print(f"   {mem.name}_clock_scales = {self.memory_clock_scales[mem]}")
+
+            print(f"Memory Mode ({self.memory_mode}):")
+            print(f"   const_mem_area = {self.const_mem_area.name}")
+            print(f"   arena_mem_area = {self.arena_mem_area.name}")
+            print(f"   cache_mem_area = {self.cache_mem_area.name}")
+            print(f"   cache_sram_size = {self.cache_sram_size}")
+
+            print("Architecture Settings:")
+            print(f"   permanent_storage_mem_area = {self.permanent_storage_mem_area.name}")
+            print(f"   feature_map_storage_mem_area = {self.feature_map_storage_mem_area.name}")
+            print(f"   fast_storage_mem_area = {self.fast_storage_mem_area.name}")
+            print(f"   sram_size = {self.sram_size}")
+
+    def _read_config(self, section, key, current_value):
         """
-        Gets the system configuration value with the given key from the vela config file.
+        Reads a given key from a particular section in the Vela config file. If the section contains the 'inherit'
+        option then we recurse into the section specified. If inherited sections result in multiple keys for a
+        particular option then the key from the parent section is used, regardless of the parsing order
         """
-        if self.vela_config is None:
-            return default_value
-        section = "SysConfig." + self.system_config
-        result = self.vela_config[section].get(key, None)
-        if result is None:
-            raise Exception("Error: System Configuration Missing key {} in section [{}] ".format(key, section))
+        if not self.vela_config.has_section(section):
+            raise ConfigOptionError(
+                "section", "{}. The section was not found in the Vela config file(s)".format(section)
+            )
+
+        result = str(current_value)
+        if self.vela_config.has_option(section, "inherit"):
+            inheritance_section = self.vela_config.get(section, "inherit")
+            # check for recursion loop
+            if inheritance_section == section:
+                raise ConfigOptionError(
+                    "inherit",
+                    "{}. This references its own section and recursion is not allowed".format(inheritance_section),
+                )
+            result = self._read_config(inheritance_section, key, result)
+
+        if self.vela_config.has_option(section, key):
+            result = self.vela_config.get(section, key)
+
         return result

diff --git a/ethosu/vela/compiler_driver.py b/ethosu/vela/compiler_driver.py
index 9e1cb3a..0739133 100644
--- a/ethosu/vela/compiler_driver.py
+++ b/ethosu/vela/compiler_driver.py

@@ -225,31 +225,18 @@
     root_sg = nng.get_root_subgraph()
 
     alloc_list = []
-    feature_maps_in_fast_storage = arch.feature_map_storage_mem_area == arch.fast_storage_mem_area
-    if feature_maps_in_fast_storage:
-        mem_alloc_scratch = (arch.feature_map_storage_mem_area, set((MemType.Scratch, MemType.Scratch_fast)))
-        alloc_list.append(mem_alloc_scratch)
-    else:
+    if arch.is_spilling_enabled():
         mem_alloc_scratch_fast = (arch.fast_storage_mem_area, set((MemType.Scratch_fast,)))
         mem_alloc_scratch = (arch.feature_map_storage_mem_area, set((MemType.Scratch,)))
         # Order is important
         alloc_list.append(mem_alloc_scratch_fast)
         alloc_list.append(mem_alloc_scratch)
+    else:
+        mem_alloc_scratch = (arch.feature_map_storage_mem_area, set((MemType.Scratch, MemType.Scratch_fast)))
+        alloc_list.append(mem_alloc_scratch)
 
     for mem_area, mem_type_set in alloc_list:
-        if feature_maps_in_fast_storage or mem_area != arch.fast_storage_mem_area:
-            tensor_allocation.allocate_tensors(
-                nng,
-                root_sg,
-                arch,
-                mem_area,
-                mem_type_set,
-                tensor_allocator=options.tensor_allocator,
-                verbose_allocation=options.verbose_allocation,
-                show_minimum_possible_allocation=options.show_minimum_possible_allocation,
-                allocation_alignment=options.allocation_alignment,
-            )
-        else:
+        if arch.is_spilling_enabled() and mem_area == arch.fast_storage_mem_area:
             # For the case where scratch_fast != scratch: attempt to place feature maps used between
             # cascaded passes in fast storage. Bisection is used to find the max possible usage of SRAM.
             alloc_results = []
@@ -285,6 +272,18 @@
                     "Increasing the value of --weight-estimation-scaling may help to resolve the issue. "
                     "See OPTIONS.md for more information.".format(arch.sram_size)
                 )
+        else:
+            tensor_allocation.allocate_tensors(
+                nng,
+                root_sg,
+                arch,
+                mem_area,
+                mem_type_set,
+                tensor_allocator=options.tensor_allocator,
+                verbose_allocation=options.verbose_allocation,
+                show_minimum_possible_allocation=options.show_minimum_possible_allocation,
+                allocation_alignment=options.allocation_alignment,
+            )
 
     # Generate command streams and serialise Npu-ops into tensors
     for sg in nng.subgraphs:

diff --git a/ethosu/vela/errors.py b/ethosu/vela/errors.py
index 1a30d54..2a635d0 100644
--- a/ethosu/vela/errors.py
+++ b/ethosu/vela/errors.py

@@ -15,8 +15,6 @@
 # limitations under the License.
 # Description:
 # Defines custom exceptions.
-import sys
-
 from .operation import Operation
 from .tensor import Tensor
 
@@ -25,31 +23,52 @@
     """Base class for vela exceptions"""
 
     def __init__(self, data):
-        self.data = data
+        self.data = "Error: " + data
 
     def __str__(self):
         return repr(self.data)
 
 
 class InputFileError(VelaError):
-    """Raised when reading the input file results in errors"""
+    """Raised when reading an input file results in errors"""
 
     def __init__(self, file_name, msg):
-        self.data = "Error reading input file {}: {}".format(file_name, msg)
+        self.data = "Reading input file {}: {}".format(file_name, msg)
 
 
 class UnsupportedFeatureError(VelaError):
-    """Raised when the input file uses non-supported features that cannot be handled"""
+    """Raised when the input network uses non-supported features that cannot be handled"""
 
     def __init__(self, data):
-        self.data = "The input file uses a feature that is currently not supported: {}".format(data)
+        self.data = "Input network uses a feature that is currently not supported: {}".format(data)
 
 
-class OptionError(VelaError):
-    """Raised when an incorrect command line option is used"""
+class CliOptionError(VelaError):
+    """Raised for errors encountered with a command line option
+
+    :param option: str object that contains the name of the command line option
+    :param option_value: the command line option that resulted in the error
+    :param msg: str object that contains a description of the specific error encountered
+    """
 
     def __init__(self, option, option_value, msg):
-        self.data = "Incorrect argument to CLI option: {} {}: {}".format(option, option_value, msg)
+        self.data = "Incorrect argument to CLI option: {} = {}: {}".format(option, option_value, msg)
+
+
+class ConfigOptionError(VelaError):
+    """Raised for errors encountered with a configuration option
+
+    :param option: str object that contains the name of the configuration option
+    :param option_value: the configuration option that resulted in the error
+    :param option_valid_values (optional): str object that contains the valid configuration option values
+    """
+
+    def __init__(self, option, option_value, option_valid_values=None):
+        self.data = "Invalid configuration of {} = {}".format(option, option_value)
+        if option_valid_values is not None:
+            self.data += " (must be {}).".format(option_valid_values)
+        else:
+            self.data += "."
 
 
 class AllocationError(VelaError):
@@ -60,7 +79,12 @@
 
 
 def OperatorError(op, msg):
-    """Called when parsing an operator results in errors"""
+    """
+    Raises a VelaError exception for errors encountered when parsing an Operation
+
+    :param op: Operation object that resulted in the error
+    :param msg: str object that contains a description of the specific error encountered
+    """
 
     assert isinstance(op, Operation)
 
@@ -91,12 +115,16 @@
 
     data = data[:-1]  # remove last newline
 
-    print("Error: {}".format(data))
-    sys.exit(1)
+    raise VelaError(data)
 
 
 def TensorError(tens, msg):
-    """Called when parsing a tensor results in errors"""
+    """
+    Raises a VelaError exception for errors encountered when parsing a Tensor
+
+    :param tens: Tensor object that resulted in the error
+    :param msg: str object that contains a description of the specific error encountered
+    """
 
     assert isinstance(tens, Tensor)
 
@@ -126,5 +154,4 @@
 
     data = data[:-1]  # remove last newline
 
-    print("Error: {}".format(data))
-    sys.exit(1)
+    raise VelaError(data)

diff --git a/ethosu/vela/high_level_command_to_npu_op.py b/ethosu/vela/high_level_command_to_npu_op.py
index f786444..efd8a03 100644
--- a/ethosu/vela/high_level_command_to_npu_op.py
+++ b/ethosu/vela/high_level_command_to_npu_op.py

@@ -171,20 +171,17 @@
 
 
 def get_region(tens: Tensor, arch: ArchitectureFeatures) -> int:
-    if arch.feature_map_storage_mem_area == arch.fast_storage_mem_area:
-        base_ptr_idx_map = {
-            MemType.Permanent_NPU: BasePointerIndex.WeightTensor,
-            MemType.Permanent_CPU: BasePointerIndex.WeightTensor,
-            MemType.Scratch: BasePointerIndex.ScratchTensor,
-            MemType.Scratch_fast: BasePointerIndex.ScratchTensor,
-        }
+    base_ptr_idx_map = {
+        MemType.Permanent_NPU: BasePointerIndex.WeightTensor,
+        MemType.Permanent_CPU: BasePointerIndex.WeightTensor,
+        MemType.Scratch: BasePointerIndex.ScratchTensor,
+    }
+
+    if arch.is_spilling_enabled():
+        base_ptr_idx_map[MemType.Scratch_fast] = BasePointerIndex.ScratchFastTensor
     else:
-        base_ptr_idx_map = {
-            MemType.Permanent_NPU: BasePointerIndex.WeightTensor,
-            MemType.Permanent_CPU: BasePointerIndex.WeightTensor,
-            MemType.Scratch: BasePointerIndex.ScratchTensor,
-            MemType.Scratch_fast: BasePointerIndex.ScratchFastTensor,
-        }
+        base_ptr_idx_map[MemType.Scratch_fast] = BasePointerIndex.ScratchTensor
+
     return int(base_ptr_idx_map[tens.mem_type])
 
 

diff --git a/ethosu/vela/npu_performance.py b/ethosu/vela/npu_performance.py
index 29e0df9..d1be5a5 100644
--- a/ethosu/vela/npu_performance.py
+++ b/ethosu/vela/npu_performance.py

@@ -60,7 +60,6 @@
 
 class PassCycles(IntEnum):
     Npu = 0
-    Cpu = auto()
     SramAccess = auto()
     DramAccess = auto()
     OnChipFlashAccess = auto()
@@ -69,34 +68,19 @@
     Size = auto()
 
     def display_name(self):
-        return (
-            "NPU",
-            "CPU",
-            "SRAM Access",
-            "DRAM Access",
-            "On-chip Flash Access",
-            "Off-chip Flash Access",
-            "Total",
-            "Size",
-        )[self.value]
+        return ("NPU", "SRAM Access", "DRAM Access", "On-chip Flash Access", "Off-chip Flash Access", "Total", "Size",)[
+            self.value
+        ]
 
     def identifier_name(self):
-        return (
-            "npu",
-            "cpu",
-            "sram_access",
-            "dram_access",
-            "on_chip_flash_access",
-            "off_chip_flash_access",
-            "total",
-            "size",
-        )[self.value]
+        return ("npu", "sram_access", "dram_access", "on_chip_flash_access", "off_chip_flash_access", "total", "size",)[
+            self.value
+        ]
 
     @staticmethod
     def all():
         return (
             PassCycles.Npu,
-            PassCycles.Cpu,
             PassCycles.SramAccess,
             PassCycles.DramAccess,
             PassCycles.OnChipFlashAccess,
@@ -460,9 +444,7 @@
     ofm_block = Block(block_config[1], block_config[0], block_config[3])
     ifm_block = Block(block_config[1], block_config[0], block_config[3])
 
-    if ps.placement == PassPlacement.Cpu:
-        cycles[PassCycles.Cpu] = arch.cpu_cycle_estimate(ps.ops[0])
-    elif primary_op:
+    if ps.placement == PassPlacement.Npu and primary_op:
         skirt = primary_op.attrs.get("skirt", skirt)
         explicit_padding = primary_op.attrs.get("explicit_padding", explicit_padding)
         assert primary_op.type.npu_block_type == ps.npu_block_type

diff --git a/ethosu/vela/register_command_stream_generator.py b/ethosu/vela/register_command_stream_generator.py
index dd63d2e..e612c30 100644
--- a/ethosu/vela/register_command_stream_generator.py
+++ b/ethosu/vela/register_command_stream_generator.py

@@ -1281,14 +1281,15 @@
     """
     emit = CommandStreamEmitter()
     arch = ArchitectureFeatures(
-        vela_config=None,
-        system_config=None,
+        vela_config_files=None,
         accelerator_config=accelerator.value,
+        system_config=ArchitectureFeatures.DEFAULT_CONFIG,
+        memory_mode=ArchitectureFeatures.DEFAULT_CONFIG,
         override_block_config=None,
         block_config_limit=None,
-        global_memory_clock_scale=1.0,
         max_blockdep=ArchitectureFeatures.MAX_BLOCKDEP,
         weight_estimation_scaling=1.0,
+        verbose_config=False,
     )
     generate_command_stream(emit, npu_op_list, arch)
     return emit.to_list()

diff --git a/ethosu/vela/scheduler.py b/ethosu/vela/scheduler.py
index 4af83a1..889bd06 100644
--- a/ethosu/vela/scheduler.py
+++ b/ethosu/vela/scheduler.py

@@ -249,10 +249,6 @@
 
         self.n_combinations_searched = 0
 
-        self.feature_maps_not_in_fast_storage = (
-            arch.tensor_storage_mem_area[TensorPurpose.FeatureMap] != arch.fast_storage_mem_area
-        )
-
         self.pareto_max_candidates = 16
 
         self.ifm_stream_npu_blocks = set(
@@ -694,7 +690,7 @@
         all_candidates = []
         for pred_pass in pred_pass_list:
             # recurse into the next pass
-            ifm_strat_data = self.search_ifm_streaming_body(pred_pass, self.feature_maps_not_in_fast_storage)
+            ifm_strat_data = self.search_ifm_streaming_body(pred_pass, self.arch.is_spilling_enabled())
 
             strat_data = self.search_all_but_one_predecessor(ps, pred_pass, ifm_strat_data)
             for strat_opt in strat_data:
@@ -1020,7 +1016,7 @@
                         output.set_format(TensorFormat.NHCWB16, arch)
                         for rewrite_op in rewrites:
                             rewrite_op.outputs[0].set_format(TensorFormat.NHCWB16, arch)
-            if self.feature_maps_not_in_fast_storage:
+            if arch.is_spilling_enabled():
                 # Remember feature maps that can be moved to fast storage for later use
                 # in use_fast_storage_for_feature_maps
                 self.sg.scheduling_info["feature_map_rewrites"] = fast_storage_tensor_rewrites

diff --git a/ethosu/vela/stats_writer.py b/ethosu/vela/stats_writer.py
index 3cd769f..e4b8156 100644
--- a/ethosu/vela/stats_writer.py
+++ b/ethosu/vela/stats_writer.py

@@ -46,7 +46,7 @@
         ]
 
         labels += (
-            ["accelerator_configuration", "system_config", "npu_clock", "sram_size"]
+            ["accelerator_configuration", "system_config", "memory_mode", "core_clock", "sram_size"]
             + [area.identifier_name() + "_bandwidth" for area in mem_areas]
             + ["weights_storage_area", "feature_map_storage_area"]
         )
@@ -83,7 +83,13 @@
 
         if arch:
             data_items += (
-                [arch.accelerator_config, arch.system_config, arch.npu_clock, arch.sram_size / 1024]
+                [
+                    arch.accelerator_config.name,
+                    arch.system_config,
+                    arch.memory_mode,
+                    arch.core_clock,
+                    arch.sram_size / 1024,
+                ]
                 + [arch.memory_bandwidths_per_second[mem_area] / 1000.0 / 1000 / 1000 for mem_area in mem_areas]
                 + [
                     arch.tensor_storage_mem_area[TensorPurpose.Weights].display_name(),
@@ -91,7 +97,7 @@
                 ]
             )
 
-        midpoint_inference_time = nng.cycles[PassCycles.Total] / arch.npu_clock
+        midpoint_inference_time = nng.cycles[PassCycles.Total] / arch.core_clock
         if midpoint_inference_time > 0:
             midpoint_fps = 1 / midpoint_inference_time
         else:
@@ -162,7 +168,6 @@
         all_cycles = (
             PassCycles.Total,
             PassCycles.Npu,
-            PassCycles.Cpu,
             PassCycles.SramAccess,
             PassCycles.DramAccess,
             PassCycles.OnChipFlashAccess,
@@ -239,7 +244,7 @@
 
     orig_mem_areas_labels = [(v, v.display_name()) for v in mem_areas_to_report()]
 
-    midpoint_inference_time = cycles[PassCycles.Total] / arch.npu_clock
+    midpoint_inference_time = cycles[PassCycles.Total] / arch.core_clock
     if midpoint_inference_time > 0:
         midpoint_fps = 1 / midpoint_inference_time
     else:
@@ -252,9 +257,10 @@
     if name:
         print("", file=f)
         print("Network summary for", name, file=f)
-    print("Accelerator configuration        {:20}".format(arch.accelerator_config), file=f)
-    print("System configuration             {:20}".format(arch.system_config), file=f)
-    print("Accelerator clock                        {:12d} MHz".format(int(arch.npu_clock / 1e6)), file=f)
+    print("Accelerator configuration        {:>20}".format(arch.accelerator_config.name), file=f)
+    print("System configuration             {:>20}".format(arch.system_config), file=f)
+    print("Memory mode                      {:>20}".format(arch.memory_mode), file=f)
+    print("Accelerator clock                        {:12d} MHz".format(int(arch.core_clock / 1e6)), file=f)
     for mem_area, label in mem_area_labels:
         print(
             "Design peak {:25}    {:12.2f} GB/s".format(

diff --git a/ethosu/vela/test/testutil.py b/ethosu/vela/test/testutil.py
index 8258827..7cdd4f5 100644
--- a/ethosu/vela/test/testutil.py
+++ b/ethosu/vela/test/testutil.py

@@ -28,14 +28,15 @@
 
 def create_arch():
     return architecture_features.ArchitectureFeatures(
-        vela_config=None,
-        system_config=None,
+        vela_config_files=None,
         accelerator_config=architecture_features.Accelerator.Ethos_U55_128.value,
+        system_config=architecture_features.ArchitectureFeatures.DEFAULT_CONFIG,
+        memory_mode=architecture_features.ArchitectureFeatures.DEFAULT_CONFIG,
         override_block_config=None,
         block_config_limit=None,
-        global_memory_clock_scale=1.0,
         max_blockdep=0,
         weight_estimation_scaling=1.0,
+        verbose_config=False,
     )
 
 

diff --git a/ethosu/vela/vela.py b/ethosu/vela/vela.py
index 6835607..4f632d5 100644
--- a/ethosu/vela/vela.py
+++ b/ethosu/vela/vela.py

@@ -19,8 +19,7 @@
 # Provides command line interface, options parsing, and network loading. Before calling the compiler driver.
 import argparse
 import ast
-import configparser
-import os.path
+import os
 import sys
 import time
 
@@ -196,13 +195,13 @@
     parser.add_argument(
         "--supported-ops-report",
         action="store_true",
-        help="Generate the SUPPORTED_OPS.md file in the current working directory and exits.",
+        help="Generate the SUPPORTED_OPS.md file in the current working directory and exit",
     )
 
+    # set network nargs to be optional to allow the support-ops-report CLI option to be used standalone
     parser.add_argument(
         "network", metavar="NETWORK", type=str, default=None, nargs="?", help="Filename of network to process"
     )
-
     parser.add_argument(
         "--output-dir", type=str, default="output", help="Output directory to write files to (default: %(default)s)"
     )
@@ -212,9 +211,10 @@
         default=None,
         help="Enables the calculation and writing of a network debug database to output directory",
     )
-
-    parser.add_argument("--config", type=str, help="Location of vela configuration file")
-
+    parser.add_argument(
+        "--config", type=str, action="append", help="Vela configuration file(s) in Python ConfigParser .ini file format"
+    )
+    parser.add_argument("--verbose-config", action="store_true", help="Verbose system configuration and memory mode")
     parser.add_argument("--verbose-graph", action="store_true", help="Verbose graph rewriter")
     parser.add_argument("--verbose-quantization", action="store_true", help="Verbose quantization")
     parser.add_argument("--verbose-packing", action="store_true", help="Verbose pass packing")
@@ -263,8 +263,14 @@
     parser.add_argument(
         "--system-config",
         type=str,
-        default="internal-default",
-        help="System configuration to use (default: %(default)s)",
+        default=architecture_features.ArchitectureFeatures.DEFAULT_CONFIG,
+        help="System configuration to select from the Vela configuration file (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--memory-mode",
+        type=str,
+        default=architecture_features.ArchitectureFeatures.DEFAULT_CONFIG,
+        help="Memory mode to select from the Vela configuration file (default: %(default)s)",
     )
     parser.add_argument(
         "--tensor-allocator",
@@ -292,15 +298,6 @@
         help="Limit block config search space, use zero for unlimited (default: %(default)s)",
     )
     parser.add_argument(
-        "--global-memory-clock-scale",
-        type=float,
-        default=1.0,
-        help=(
-            "Performs an additional scaling of the individual memory clock scales specified by the system config "
-            "(default: %(default)s)"
-        ),
-    )
-    parser.add_argument(
         "--pareto-metric",
         default=ParetoMetric.BwCycMem,
         type=lambda s: ParetoMetric[s],
@@ -344,14 +341,6 @@
     )
     args = parser.parse_args(args=args)
 
-    # Read configuration file
-    config_file = args.config
-    config = None
-    if config_file is not None:
-        with open(config_file) as f:
-            config = configparser.ConfigParser()
-            config.read_file(f)
-
     # Generate the supported ops report and exit
     if args.supported_ops_report:
         generate_supported_ops()
@@ -360,6 +349,12 @@
     if args.network is None:
         parser.error("the following argument is required: NETWORK")
 
+    # check all config files exist because they will be read as a group
+    if args.config is not None:
+        for filename in args.config:
+            if not os.access(filename, os.R_OK):
+                raise InputFileError(filename, "File not found or is not readable.")
+
     sys.setrecursionlimit(args.recursion_limit)
 
     if args.force_block_config:
@@ -374,14 +369,15 @@
         parser.error("the following argument needs to be a power of 2: ALLOCATION_ALIGNMENT")
 
     arch = architecture_features.ArchitectureFeatures(
-        vela_config=config,
+        vela_config_files=args.config,
         system_config=args.system_config,
+        memory_mode=args.memory_mode,
         accelerator_config=args.accelerator_config,
         override_block_config=force_block_config,
         block_config_limit=args.block_config_limit,
-        global_memory_clock_scale=args.global_memory_clock_scale,
         max_blockdep=args.max_block_dependency,
         weight_estimation_scaling=args.weight_estimation_scaling,
+        verbose_config=args.verbose_config,
     )
 
     compiler_options = compiler_driver.CompilerOptions(

diff --git a/vela.ini b/vela.ini
new file mode 100644
index 0000000..94ab4fa
--- /dev/null
+++ b/vela.ini

@@ -0,0 +1,99 @@
+; Copyright (C) 2020 Arm Limited or its affiliates. All rights reserved.
+;
+; SPDX-License-Identifier: Apache-2.0
+;
+; Licensed under the Apache License, Version 2.0 (the License); you may
+; not use this file except in compliance with the License.
+; You may obtain a copy of the License at
+;
+; www.apache.org/licenses/LICENSE-2.0
+;
+; Unless required by applicable law or agreed to in writing, software
+; distributed under the License is distributed on an AS IS BASIS, WITHOUT
+; WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+; See the License for the specific language governing permissions and
+; limitations under the License.
+
+; -----------------------------------------------------------------------------
+; Vela configuration file
+
+; -----------------------------------------------------------------------------
+; System Configuration
+
+; Ethos-U55 Deep Embedded: SRAM (1.6 GB/s) and Flash (0.1 GB/s)
+[System_Config.Ethos_U55_Deep_Embedded]
+core_clock=200e6
+axi0_port=Sram
+axi1_port=OffChipFlash
+Sram_clock_scale=1.0
+OffChipFlash_clock_scale=0.0625
+
+; Ethos-U55 High-End Embedded: SRAM (4 GB/s) and Flash (0.5 GB/s)
+[System_Config.Ethos_U55_High_End_Embedded]
+core_clock=500e6
+axi0_port=Sram
+axi1_port=OffChipFlash
+Sram_clock_scale=1.0
+OffChipFlash_clock_scale=0.125
+
+; Ethos-U65 Embedded: SRAM (8 GB/s) and Flash (0.5 GB/s)
+[System_Config.Ethos_U65_Embedded]
+core_clock=500e6
+axi0_port=Sram
+axi1_port=OffChipFlash
+Sram_clock_scale=1.0
+OffChipFlash_clock_scale=0.0625
+
+; Ethos-U65 Mid-End: SRAM (8 GB/s) and DRAM (3.75 GB/s)
+[System_Config.Ethos_U65_Mid_End]
+core_clock=500e6
+axi0_port=Sram
+axi1_port=Dram
+Sram_clock_scale=1.0
+Dram_clock_scale=0.46875
+
+; Ethos-U65 High-End: SRAM (16 GB/s) and DRAM (3.75 GB/s)
+[System_Config.Ethos_U65_High_End]
+core_clock=1e9
+axi0_port=Sram
+axi1_port=Dram
+Sram_clock_scale=1.0
+Dram_clock_scale=0.234375
+
+; Ethos-U65 Client-Server: SRAM (16 GB/s) and DRAM (12 GB/s)
+[System_Config.Ethos_U65_Client_Server]
+core_clock=1e9
+axi0_port=Sram
+axi1_port=Dram
+Sram_clock_scale=1.0
+Dram_clock_scale=0.75
+
+; -----------------------------------------------------------------------------
+; Memory Mode
+
+; SRAM Only: only one AXI port is used and the SRAM is used for all storage
+[Memory_Mode.Sram_Only]
+const_mem_area=Axi0
+arena_mem_area=Axi0
+cache_mem_area=Axi0
+
+; Shared SRAM: the SRAM is shared between the Ethos-U and the Cortex-M software.
+; The non-SRAM memory is assumed to be read-only
+[Memory_Mode.Shared_Sram]
+const_mem_area=Axi1
+arena_mem_area=Axi0
+cache_mem_area=Axi0
+
+; Dedicated SRAM: the SRAM (384KB) is only for use by the Ethos-U
+; The non-SRAM memory is assumed to be read-writeable
+[Memory_Mode.Dedicated_Sram]
+const_mem_area=Axi1
+arena_mem_area=Axi1
+cache_mem_area=Axi0
+cache_sram_size=393216
+
+; Dedicated SRAM 512KB: the SRAM (512KB) is only for use by the Ethos-U
+; The non-SRAM memory is assumed to be read-writeable
+[Memory_Mode.Dedicated_Sram_512KB]
+inherit=Memory_Mode.Dedicated_Sram
+cache_sram_size=524288
\ No newline at end of file
commit	1bd531dec0b4eb745fb8856d14c1aba2b8a73026	[log] [tgz]
author	Tim Hall <tim.hall@arm.com>	Sun Nov 01 20:59:36 2020 +0000
committer	Tim Hall <tim.hall@arm.com>	Fri Nov 20 12:55:47 2020 +0000
tree	a0265a0accd2395277fe88be27164d09541abc7f
parent	c8a73868d40cf63380f634baeb51aa7aa993fc0c [diff]