commit | ce911a2f1d9cd678fb9fe82a40c86ad0c6772f5a | [log] [tgz] |
---|---|---|
author | Tai Ly <tai.ly@arm.com> | Thu Mar 21 17:01:14 2024 +0000 |
committer | Tai Ly <tai.ly@arm.com> | Mon Apr 08 22:18:34 2024 +0000 |
tree | 68faf6d7b1c676705a022b32d8aa7950db03ab5e | |
parent | 8f9e2842ce7d25645233ad4f6fa406be982346ae [diff] |
Add conversions of U8 to/from BF16 and FP8 Adds type to PadAttribute and ClampAttribute so their pad_const and max_val/min_val can be deserialized according to type Adds conversion functions of U8 arrays to/from BF16/FP8 values Also, refactor and expose TosaSerializer.convertDataToUint8Vec for converting dtype/data to uint8 list for serialization And modify convertDataToUint8Vec to serialize bf16 values into 2 bytes each, and serialize fp8 values into single bytes each. Signed-off-by: Tai Ly <tai.ly@arm.com> Change-Id: I05659e8187c76d359f1cc9f71c8c23cafd0e877f
The TOSA Serialization library provides methods to read and write serialized TOSA graphs (https://developer.mlplatform.org/w/tosa/). The library includes a FlatBuffers schema and a C++ API for reading and writing TOSA graphs.
The section below describes serialization_lib API usage. For more details, please refer to include/tosa_serialization_handler.h
.
This is the top-level class that contains the entire TOSA graph. In particular, it contains a vector of TosaSerializationBasicBlock
objects, and provides API for file IO, block access, and version checking.
a. `LoadFileJson(filename)`: Load json-formatted file "filename" from disk, and initialize the internal graph structure. Requires the schema file to be loaded via `LoadFileSchema()`. b. `SaveFileJson(filename)`: Snapshots the internal graph structure and saves out JSON-formatted file `filename` to disk. Requires the schema file to be loaded via `LoadFileSchema()`. c. `LoadFileTosaFlatbuffer(filename)`: Load serialized flatbuffer file "filename" from disk, and initialize the internal graph structure. d. `SaveFileTosaFlatbuffer(filename)`: Snapshot the internal graph structure and saves out serialized flatbuffer file `filename` to disk. e. `GetTosaVersion()`: Return TOSA version implemented by the serialization library. f. `GetBlockByName()`: Return vector of `TosaSerializationBasicBlock`. Returns `nullptr` if nothing matches. A valid graph must have one `main` block as the first block being traversed. g. `GetMainBlock()`: Shortcut for accessing the `main` block. Equivalent to `GetBlockByName("main")`. h. `GetInputs()` / `GetOutputs()`: Shortcut for `main` block's input/output tensor name. Input tensors of the main block are usually treated as `tosa.PLACEHOLDER`. Output tensors are the output of the entire graph and should be evaluated when graph traversal has finished.
This is the basic-block class. It contains vectors of TosaSerializationOperator
and TosaSerializationTensor
. Once entering a basic block, all of the operators within the block will be evaluated in order.
Upon reaching a TOSA control flow operator (tosa.WHILE
and tosa.COND_IF
), the status of current unfinished block will be saved, and the blocks specified in control flow operator will be evaluated first. Once the control flow blocks finishes its evaluation, the original unfinished block status will be restored and evaluation continues. This is more analogous to a function call than a compiler basic block.
a. `GetName()`: Return string of the basic block. b. `GetOperators()`: Return vector of `TosaSerializationOperator` c. `GetTensors()`: Return vector of `TosaSerializationTensor` d. `GetTensorByName(name)`: Return the `TosaSerializationTensor` with name `name`. Returns `nullptr` if nothing matches. e. `GetInputs()` / `GetOutputs()`: Return input/output tensor name of the basic block.
The operator class contains (1) what TOSA Op, (2) attribute (compile-time- known input), (3) quantization information and (4) input/output tensor names. The combination of (Op, attribute, quantization information) is defined in include/operator.def.
a. `GetOp()`: Return TOSA Op. Defined in schema `tosa.fbs`. b. `GetAttribute()` / `GetAttributeType()`: `GetAttribute()` returns the base object of attribute. `GetAttributeType()` returns which type of attribute the base object needs to be casted to. Type of attribute is defined in `tosa.fbs` and `include/attribute.def`. c. `GetQuantInfo()` + `GetQuantInfoType()`: `GetQuantInfo()` returns the base object's quantization information. `GetQuantInfoType()` returns which type of quantization information the base object needs to be casted to. Type of quantization information is defined in `tosa.fbs` and `include/quant_info.def`. d. `GetInputTensorNames()` / `GetOutputTensorNames()`: Returns the input/output tensor name of the basic block. e. `GetInputTensors()` / `GetOutputTensors()`: Returns the input/output tensor of the basic block.
The tensor class contains (1) data type, (2) shape, (3) symbolic link to numpy file, (4) format and (5) usage.
a. `GetName()` / `SetName(name)`: `GetName()` returns the name of the tensor. `SetName()` sets the name of the tensor. b. `GetShape()`: Returns the shape of the tensor as `vector<int32_t>`. c. `GetDtype()` / `SetDtype(dtype)`: `GetDtype()` returns the data type of the tensor. `SetDtype()` sets the data type of the tensor. DType is defined in `tosa.fbs`. d. `GetNpyFilePtr()`: Return the numpy file pointer of this tensor if this is a constant tensor. Return `nullptr` if the tensor is not constant.
The TOSA Serialization Library is licensed under Apache-2.0.
Other third party projects are referenced as sub-modules and as such, are licensed under the licenses stated in their projects.