Blame - model_conditioning_examples/post_training_quantization.py - ml/ethos-u/ml-embedded-evaluation-kit

blob: 42069f5d8955f1dd2333d6403a15f65b1a1cda15 [file] [log] [blame]

Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	1	# SPDX-FileCopyrightText: Copyright 2021, 2023 Arm Limited and/or its affiliates <open-source-office@arm.com>
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	2	# SPDX-License-Identifier: Apache-2.0
				3	#
				4	# Licensed under the Apache License, Version 2.0 (the "License");
				5	# you may not use this file except in compliance with the License.
				6	# You may obtain a copy of the License at
				7	#
				8	# http://www.apache.org/licenses/LICENSE-2.0
				9	#
				10	# Unless required by applicable law or agreed to in writing, software
				11	# distributed under the License is distributed on an "AS IS" BASIS,
				12	# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
				13	# See the License for the specific language governing permissions and
				14	# limitations under the License.
				15	"""
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	16	This script will provide you with an example of how to perform
				17	post-training quantization in TensorFlow.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	18
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	19	The output from this example will be a TensorFlow Lite model file
				20	where weights and activations are quantized to 8bit integer values.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	21
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	22	Quantization helps reduce the size of your models and is necessary
				23	for running models on certain hardware such as Arm Ethos NPU.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	24
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	25	In addition to quantizing weights, post-training quantization uses
				26	a calibration dataset to capture the minimum and maximum values of
				27	all variable tensors in your model. By capturing these ranges it
				28	is possible to fully quantize not just the weights of the model
				29	but also the activations.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	30
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	31	Depending on the model you are quantizing there may be some accuracy loss,
				32	but for a lot of models the loss should be minimal.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	33
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	34	If you are targeting an Arm Ethos-U55 NPU then the output
				35	TensorFlow Lite file will also need to be passed through the Vela
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	36	compiler for further optimizations before it can be used.
				37
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	38	For more information on using Vela see:
				39	https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/
				40	For more information on post-training quantization see:
				41	https://www.tensorflow.org/lite/performance/post_training_integer_quant
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	42	"""
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	43
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	44	import pathlib
				45
				46	import numpy as np
				47	import tensorflow as tf
				48
				49	from training_utils import get_data, create_model
				50
				51
				52	def post_training_quantize(keras_model, sample_data):
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	53	"""
				54	Quantize Keras model using post-training quantization with some sample data.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	55
				56	TensorFlow Lite will have fp32 inputs/outputs and the model will handle quantizing/dequantizing.
				57
				58	Args:
				59	keras_model: Keras model to quantize.
				60	sample_data: A numpy array of data to use as a representative dataset.
				61
				62	Returns:
				63	Quantized TensorFlow Lite model.
				64	"""
				65
				66	converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
				67
				68	# We set the following converter options to ensure our model is fully quantized.
				69	# An error should get thrown if there is any ops that can't be quantized.
				70	converter.optimizations = [tf.lite.Optimize.DEFAULT]
				71	converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
				72
				73	# To use post training quantization we must provide some sample data that will be used to
				74	# calculate activation ranges for quantization. This data should be representative of the data
				75	# we expect to feed the model and must be provided by a generator function.
				76	def generate_repr_dataset():
				77	for i in range(100): # 100 samples is all we should need in this example.
				78	yield [np.expand_dims(sample_data[i], axis=0)]
				79
				80	converter.representative_dataset = generate_repr_dataset
				81	tflite_model = converter.convert()
				82
				83	return tflite_model
				84
				85
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	86	# pylint: disable=duplicate-code
				87	def evaluate_tflite_model(
				88	tflite_save_path: pathlib.Path,
				89	x_test: np.ndarray,
				90	y_test: np.ndarray
				91	):
				92	"""
				93	Calculate the accuracy of a TensorFlow Lite model using TensorFlow Lite interpreter.
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	94
				95	Args:
				96	tflite_save_path: Path to TensorFlow Lite model to test.
				97	x_test: numpy array of testing data.
				98	y_test: numpy array of testing labels (sparse categorical).
				99	"""
				100
				101	interpreter = tf.lite.Interpreter(model_path=str(tflite_save_path))
				102
				103	interpreter.allocate_tensors()
				104	input_details = interpreter.get_input_details()
				105	output_details = interpreter.get_output_details()
				106
				107	accuracy_count = 0
				108	num_test_images = len(y_test)
				109
				110	for i in range(num_test_images):
				111	interpreter.set_tensor(input_details[0]['index'], x_test[i][np.newaxis, ...])
				112	interpreter.invoke()
				113	output_data = interpreter.get_tensor(output_details[0]['index'])
				114
				115	if np.argmax(output_data) == y_test[i]:
				116	accuracy_count += 1
				117
				118	print(f"Test accuracy quantized: {accuracy_count / num_test_images:.3f}")
				119
				120
				121	def main():
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	122	"""
				123	Run post-training quantization
				124	"""
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	125	x_train, y_train, x_test, y_test = get_data()
				126	model = create_model()
				127
				128	# Compile and train the model in fp32 as normal.
				129	model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
				130	loss=tf.keras.losses.sparse_categorical_crossentropy,
				131	metrics=['accuracy'])
				132
				133	model.fit(x=x_train, y=y_train, batch_size=128, epochs=5, verbose=1, shuffle=True)
				134
				135	# Test the fp32 model accuracy.
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	136	test_loss, test_acc = model.evaluate(x_test, y_test) # pylint: disable=unused-variable
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	137	print(f"Test accuracy float: {test_acc:.3f}")
				138
				139	# Quantize and export the resulting TensorFlow Lite model to file.
				140	tflite_model = post_training_quantize(model, x_train)
				141
				142	tflite_models_dir = pathlib.Path('./conditioned_models/')
				143	tflite_models_dir.mkdir(exist_ok=True, parents=True)
				144
				145	quant_model_save_path = tflite_models_dir / 'post_training_quant_model.tflite'
				146	with open(quant_model_save_path, 'wb') as f:
				147	f.write(tflite_model)
				148
				149	# Test the quantized model accuracy. Save time by only testing a subset of the whole data.
				150	num_test_samples = 1000
Alex Tawse	daba3cf	2023-09-29 15:55:38 +0100	[diff] [blame]	151	evaluate_tflite_model(
				152	quant_model_save_path,
				153	x_test[0:num_test_samples],
				154	y_test[0:num_test_samples]
				155	)
				156	# pylint: enable=duplicate-code
alexander	3c79893	2021-03-26 21:42:19 +0000	[diff] [blame]	157
				158
				159	if __name__ == "__main__":
				160	main()