-7.8 C
New York
Thursday, January 23, 2025

Speed up BiT Mannequin Even Extra with Quantization Utilizing OpenVINO and NNCF



1. Introduction

Within the first half of this weblog collection, we mentioned find out how to use Intel®’s OpenVINO™ toolkit to speed up inference of the Huge Switch (BiT) mannequin for laptop imaginative and prescient duties. We lined the method of importing the BiT mannequin into the OpenVINO setting, leveraging {hardware} optimizations, and benchmarking efficiency. Our outcomes showcased important efficiency good points and lowered inference latency for BiT when utilizing OpenVINO in comparison with the unique TensorFlow implementation. With this sturdy base end in place, there’s nonetheless room for additional optimization. On this second half, we are going to additional improve BiT mannequin inference with the assistance of OpenVINO and Neural Community Compression Framework (NNCF) and low precision (INT8) inference. NNCF gives subtle instruments for neural community compression by means of quantization, pruning, and sparsity methods tailor-made for deep studying inference. This enables BiT fashions to turn into viable for energy and memory-constrained environments the place the unique mannequin measurement could also be prohibitive. The methods offered shall be relevant to many deep studying fashions past BiT.

2. Mannequin Quantization

Mannequin quantization is an optimization method that reduces the precision of weights and activations in a neural community. It converts 32-bit floating level representations (FP32) to decrease bit-widths like 16-bit floats (FP16) or 8-bit integers (INT8) or 4-bit integers (INT4). The important thing profit is enhanced effectivity — smaller mannequin measurement and quicker inference time. These enhancements not solely enhance effectivity on server platforms however, extra importantly, additionally allow deployment onto resource-constrained edge units. So, whereas server platform efficiency is improved, the larger influence is opening all-new deployment alternatives. Quantization transforms fashions from being restricted to knowledge facilities to being deployable even on low-power units with restricted compute or reminiscence. This massively expands the attain of AI to the true edge.

Beneath are a couple of of the important thing mannequin quantization ideas:

  • Precision discount — Decreases the variety of bits used to signify weights and activations. Frequent bit-widths: INT8, FP16. Allows smaller fashions.
  • Effectivity — Compressed fashions are smaller and quicker, resulting in environment friendly system useful resource utilization.
  • Commerce-offs — Balancing mannequin compression, velocity, and accuracy for goal {hardware}. The objective is to optimize throughout all fronts.
  • Methods — Put up-training and quantization-aware coaching. Bakes in resilience to decrease precision.
  • Schemes — Quantization methods like weight, activation, or mixed strategies strike a steadiness between compressing fashions and preserving accuracy.
  • Preserving accuracy — High-quality-tuning, calibration, and retraining preserve the standard of real-world knowledge.

3. Neural Community Compression Framework (NNCF)

NNCF is a strong device for optimizing deep studying fashions, such because the Huge Switch (BiT) mannequin, to attain improved efficiency on varied {hardware}, starting from edge to knowledge heart. It gives a complete set of options and capabilities for mannequin optimization, making it simple for builders to optimize fashions for low-precision inference. A few of the key capabilities embrace:

  • Assist for quite a lot of post-training and training-time algorithms with minimal accuracy drop.
  • Seamless mixture of pruning, sparsity, and quantization algorithms.
  • Assist for quite a lot of fashions: NNCF can be utilized to optimize fashions from quite a lot of frameworks, together with TensorFlow, PyTorch, ONNX, and OpenVINO.

NNCF gives samples that show the utilization of compression algorithms for various use instances and fashions. See compression outcomes achievable with the NNCF-powered samples on the Mannequin Zoo web page. For extra particulars seek advice from this GitHub repo.

4. BiT Classification Mannequin Optimization with OpenVINO™

Observe: Earlier than continuing with the next steps, guarantee you will have a conda setting arrange. Confer with this weblog publish for detailed directions on organising the conda setting.

4.1. Obtain BiT_M_R50x1_1 tf classification mannequin:

wget https://tfhub.dev/google/bit/m-r50x1/1?tf-hub-format=compressed 
-O bit_m_r50x1_1.tar.gz

mkdir -p bit_m_r50x1_1 && tar -xvf bit_m_r50x1_1.tar.gz -C bit_m_r50x1_1

4.2. OpenVINO™ Mannequin Optimization:

Execute the under command contained in the conda setting to generate OpenVINO IR mannequin recordsdata (.xml and .bin) for the bit_m_r50x1_1 mannequin. These mannequin recordsdata shall be used for additional optimization and for inference accuracy validation in subsequent sections.

ovc ./bit_m_r50x1_1 --output_model ./bit_m_r50x1_1/ov/fp32/bit_m_r50x1_1 
--compress_to_fp16 False

5. Knowledge Preparation

To guage the accuracy influence of quantization on our BiT mannequin, we’d like an acceptable dataset. For this, we leverage the ImageNet 2012 validation set which accommodates 50,000 photographs throughout 1000 courses. The ILSVRC2012 validation floor fact is used for cross-referencing mannequin predictions throughout accuracy measurement.

By testing our compressed fashions on established knowledge like ImageNet validation knowledge, we are able to higher perceive the real-world utility of our optimizations. Sustaining maximal accuracy whereas minimizing useful resource utilization is essential for edge deployment. This dataset gives the rigorous and unbiased means to successfully validate these trade-offs.

Observe: Accessing and downloading the ImageNet dataset requires registration.

6. Quantization Utilizing NNCF

On this part, we are going to delve into the particular steps concerned in quantizing the BiT mannequin utilizing NNCF. The quantization course of includes making ready a calibration dataset and making use of 8-bit quantization to the mannequin, adopted by accuracy analysis.

6.1. Getting ready Calibration Dataset:

At this step, create an occasion of the nncf.Dataset class that represents the calibration dataset. The nncf.Dataset class generally is a wrapper over the framework dataset object used for mannequin coaching or validation. Beneath is a pattern code snippet of nncf.Dataset() name with remodeled knowledge samples.

# TF Dataset cut up for nncf calibration
img2012_val_split = get_val_data_split(tf_dataset_,
train_split=0.7,
val_split=0.3,
shuffle=True,
shuffle_size=50000)

img2012_val_split = img2012_val_split.map(nncf_transform).batch(BATCH_SIZE)

calibration_dataset = nncf.Dataset(img2012_val_split)

The transformation perform is a perform that takes a pattern from the dataset and returns knowledge that may be handed to the mannequin for inference. Beneath is the code snippet of the info rework.

# Knowledge rework perform for NNCF calibration 
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture

6.2. NNCF Quantization (FP32 to INT8):

As soon as the calibration dataset is ready and the mannequin object is instantiated, the subsequent step includes making use of 8-bit quantization to the mannequin. That is achieved through the use of the nncf.quantize() API, which takes the OpenVINO FP32 mannequin generated within the earlier steps together with the calibrated dataset values to provoke the quantization course of. Whereas nncf.quantize() gives quite a few superior configuration knobs, in lots of instances like this one, it simply works out of the field or with minor changes. Beneath, is pattern code snippet of nncf.quantize() API name.

ov_quantized_model = nncf.quantize(ov_model, 
calibration_dataset,
fast_bias_correction=False)

For additional particulars, the official documentation gives a complete information on the fundamental quantization circulate, together with organising the setting, making ready the calibration dataset, and calling the quantization API to use 8-bit quantization to the mannequin.

6.3. Accuracy Analysis

Because of NNCF mannequin quantization course of, the OpenVINO INT8 quantized mannequin is generated. To guage the influence of quantization on mannequin accuracy, we carry out a complete benchmarking comparability between the unique FP32 mannequin and the quantized INT8 mannequin. This comparability includes measuring the accuracy of BiT Mannequin (m-r50x1/1) on the ImageNet 2012 Validation dataset. The accuracy analysis outcomes are proven in Desk 1.

With TensorFlow (FP32) to OpenVINO™ (FP32) mannequin optimization, the classification accuracy remained constant at 0.70154, confirming that conversion to OpenVINO™ mannequin illustration doesn’t have an effect on accuracy. Moreover, with NNCF Quantization to an 8-bit integer mannequin, the accuracy was solely marginally impacted of lower than 0.03%, demonstrating that the quantization course of didn’t compromise the mannequin’s classification talents.

Confer with Appendix A for the Python script bit_ov_model_quantization.py, which incorporates knowledge preparation, mannequin optimization, NNCF quantization duties, and accuracy analysis.

The utilization of the bit_ov_model_quantization.py script is as follows:

$python bit_ov_model_quantization.py --help
utilization: bit_ov_model_quantization.py [-h] [--inp_shape INP_SHAPE] --dataset_dir DATASET_DIR --gt_labels GT_LABELS --bit_m_tf BIT_M_TF --bit_ov_fp32 BIT_OV_FP32
[--bit_ov_int8 BIT_OV_INT8]

BiT Classification mannequin quantization and accuracy measurement

required arguments:
--dataset_dir DATASET_DIR
Listing path to ImageNet2012 validation dataset
--gt_labels GT_LABELS
Path to ImageNet2012 validation ds gt labels file
--bit_m_tf BIT_M_TF Path to BiT TF fp32 mannequin file
--bit_ov_fp32 BIT_OV_FP32
Path to BiT OpenVINO fp32 mannequin file

non-obligatory arguments:
-h, --help present this assist message and exit
--inp_shape INP_SHAPE
N,W,H,C
--bit_ov_int8 BIT_OV_INT8
Path to save lots of BiT OpenVINO INT8 mannequin file

7. Conclusion

The outcomes emphasize the efficacy of OpenVINO™ and NNCF in optimizing mannequin effectivity whereas minimizing computational necessities. The power to attain outstanding efficiency and accuracy retention, notably when compressing fashions to INT8 precision, demonstrates the practicality of leveraging OpenVINO™ for deployment in varied environments together with resource-constrained environments. NNCF proves to be a helpful device for practitioners in search of to steadiness mannequin measurement and computational effectivity with out substantial compromise on classification accuracy, opening avenues for enhanced mannequin deployment throughout numerous {hardware} configurations.

Notices & Disclaimers

Efficiency varies by use, configuration, and different elements. Study extra on the Efficiency Index web site.

Efficiency outcomes are primarily based on testing as of dates proven in configurations and should not replicate all publicly obtainable ​updates. See backup for configuration particulars.

No product or element could be completely safe.
Your prices and outcomes might range.
Intel applied sciences might require enabled {hardware}, software program or service activation.

© Intel Company. Intel, the Intel brand, and different Intel marks are emblems of Intel Company or its subsidiaries. Different names and types could also be claimed because the property of others. ​

Extra Assets

Appendix A

  • ILSVRC2012 floor fact: ground_truth_ilsvrc2012_val.txt
  • See bit_ov_model_quantization.py under for the BiT mannequin quantization pipeline with NNCF described on this weblog.
"""
Copyright (c) 2022 Intel Company

Licensed underneath the Apache License, Model 2.0 (the "License");
chances are you'll not use this file besides in compliance with the License.
Chances are you'll receive a duplicate of the License at
http://www.apache.org/licenses/LICENSE-2.0
Until required by relevant regulation or agreed to in writing, software program
distributed underneath the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, both specific or implied.
See the License for the particular language governing permissions and
limitations underneath the License.
"""

"""
This script is examined with TensorFlow v2.12.1 and OpenVINO v2023.1.0

Utilization Instance under (with required parameters):

python bit_ov_model_quantization.py
--gt_labels ./<path_to>/ground_truth_ilsvrc2012_val.txt
--dataset_dir ./<path-to-dataset>/ilsvrc2012_val_ds/
--bit_m_tf ./<path-to-tf>/mannequin
--bit_ov_fp32 ./<path-to-ov>/fp32_ir_model

"""

import os, sys
from openvino.runtime import Core
import numpy as np
import argparse, os
import nncf
import openvino.runtime as ov
import pandas as pd
import re

import logging
logging.basicConfig(stage=logging.ERROR)

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow.compat.v2 as tf

from PIL import Picture
from sklearn.metrics import accuracy_score

ie = Core()
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# For high 1 labels.
MAX_PREDS = 1
BATCH_SIZE = 1
IMG_SIZE = (224, 224) # Default Imagenet picture measurement
NUM_CLASSES = 1000 # For Imagenette dataset

# Knowledge rework perform for NNCF calibration
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture

# Knowledge rework perform for imagenet ds validation
def val_transform(image_path, label):
picture = tf.io.decode_jpeg(tf.io.read_file(image_path), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
img_reshaped = tf.reshape(picture, [IMG_SIZE[0], IMG_SIZE[1], 3])
picture = tf.picture.convert_image_dtype(img_reshaped, tf.float32)
return picture, label

# Validation dataset cut up
def get_val_data_split(tf_dataset_, train_split=0.7, val_split=0.3,
shuffle=True, shuffle_size=50000):
if shuffle:
ds = tf_dataset_.shuffle(shuffle_size, seed=12)

train_size = int(train_split * shuffle_size)
val_size = int(val_split * shuffle_size)
val_ds = ds.skip(train_size).take(val_size)

return val_ds

# OpenVINO IR mannequin inference validation
def ov_infer_validate(mannequin: ov.Mannequin,
val_loader: tf.knowledge.Dataset) -> tf.Tensor:

mannequin.reshape([1,IMG_SIZE[0],IMG_SIZE[1],3]) # If MO ran with Dynamic batching
compiled_model = ov.compile_model(mannequin)
output = compiled_model.outputs[0]

ov_predictions = []
for img, label in val_loader:#.take(25000):#.take(5000):#.take(5):
pred = compiled_model(img)[output]
ov_result = tf.reshape(pred, [-1])
top_label_idx = np.argsort(ov_result)[-MAX_PREDS::][::-1]
ov_predictions.append(top_label_idx)

return ov_predictions

# OpenVINO IR mannequin NNCF Quantization
def quantize(ov_model, calibration_dataset): #, val_loader: tf.knowledge.Dataset):
print("Began NNCF qunatization course of")
ov_quantized_model = nncf.quantize(ov_model, calibration_dataset, fast_bias_correction=False)
return ov_quantized_model

# OpenVINO FP32 IR mannequin inference
def ov_fp32_predictions(ov_fp32_model, validation_dataset):
# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)
print("Beginning OV FP32 Mannequin Inference...!!!")
ov_fp32_pred = ov_infer_validate(ov_model, validation_dataset)
return ov_fp32_pred

def nncf_quantize_int8_pred_results(ov_fp32_model, calibration_dataset,
validation_dataset, ov_int8_model):

# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)

# NNCF Quantization of OpenVINO IR mannequin
int8_ov_model = quantize(ov_model, calibration_dataset)
ov.serialize(int8_ov_model, ov_int8_model)
print("NNCF Quantization Course of accomplished..!!!")

ov_int8_model = ie.read_model(ov_int8_model)
print("Beginning OV INT8 Mannequin Inference...!!!")
ov_int8_pred = ov_infer_validate(ov_int8_model, validation_dataset)

return ov_int8_pred

def tf_inference(tf_saved_model_path, val_loader: tf.knowledge.Dataset):

tf_model = tf.keras.fashions.load_model(tf_saved_model_path)
print("Beginning TF FP32 Mannequin Inference...!!!")
tf_predictions = []
for img, label in val_loader:
tf_result = tf_model.predict(img, verbose=0)
tf_result = tf.reshape(tf_result, [-1])
top5_label_idx = np.argsort(tf_result)[-MAX_PREDS::][::-1]
tf_predictions.append(top5_label_idx)

return tf_predictions

"""
Module: bit_classificaiton
Description: API to run BiT classificaiton OpenVINO IR mannequin INT8 Quantization on utilizing NNCF and
perfom accuracy metrics for TF FP32, OV FP32 and OV INT8 on ImageNet2012 Validation dataset
"""
def bit_classification(args):

ip_shape = args.inp_shape
if isinstance(ip_shape, str):
ip_shape = [int(i) for i in ip_shape.split(",")]
if len(ip_shape) != 4:
sys.exit( "Enter form error. Set form 'N,W,H,C'. For instance: '1,224,224,3' " )

# Imagenet2012 validataion dataset used for TF and OV FP32 accuracy testing.
#dataset_dir = ../dataset/ilsvrc2012_val/1.0/ + "*.JPEG"
dataset_dir = args.dataset_dir + "*.JPEG"
tf_dataset = tf.knowledge.Dataset.list_files(dataset_dir)

gt_lables = open(args.gt_labels)

val_labels = []
for l in gt_lables:
val_labels.append(str(l))

# Producing ImageNet 2012 validation dataset dictionary (img, label)
val_images = []
val_labels_in_img_order = []
for i, v in enumerate(tf_dataset):
img_path = str(v.numpy())
id = int(img_path.cut up('/')[-1].cut up('_')[-1].cut up('.')[0])
val_images.append(img_path[2:-1])
val_labels_in_img_order.append(int(re.sub(r'n','',val_labels[id-1])))

val_df = pd.DataFrame(knowledge={'photographs': val_images, 'label': val_labels_in_img_order})

# Changing imagenet2012 val dictionary into tf.knowledge.Dataset
tf_dataset_ = tf.knowledge.Dataset.from_tensor_slices((listing(val_df['images'].values), val_df['label'].values))
imgnet2012_val_dataset = tf_dataset_.map(val_transform).batch(BATCH_SIZE)

# TF Dataset cut up for nncf calibration
img2012_val_split_for_calib = get_val_data_split(tf_dataset_, train_split=0.7,
val_split=0.3, shuffle=True,
shuffle_size=50000)

img2012_val_split_for_calib = img2012_val_split_for_calib.map(nncf_transform).batch(BATCH_SIZE)

# TF Mannequin Inference
tf_model_path = args.bit_m_tf
print(f"Tensorflow FP32 Mannequin {args.bit_m_tf}")
tf_p = tf_inference(tf_model_path, imgnet2012_val_dataset)

#acc_score = accuracy_score(tf_pred, val_labels_in_img_order[0:25000])
acc_score = accuracy_score(tf_p, val_labels_in_img_order)
print(f"Accuracy of FP32 TF mannequin = {acc_score}n")

# OpenVINO Mannequin Inference
print(f"OpenVINO FP32 IR Mannequin {args.bit_ov_fp32}")
ov_fp32_p = ov_fp32_predictions(args.bit_ov_fp32, imgnet2012_val_dataset)

acc_score = accuracy_score(ov_fp32_p, val_labels_in_img_order)
print(f"Accuracy of FP32 IR mannequin = {acc_score}n")

print("Beginning NNCF dataset Calibration....!!!")
calibration_dataset = nncf.Dataset(img2012_val_split_for_calib)

# OpenVINO IR FP32 to INT8 Mannequin Quantization with NNCF and
# INT8 predictions outcomes on validation dataset
ov_int8_p = nncf_quantize_int8_pred_results(args.bit_ov_fp32, calibration_dataset,
imgnet2012_val_dataset, args.bit_ov_int8)

print(f"OpenVINO NNCF Quantized INT8 IR Mannequin {args.bit_ov_int8}")
acc_score = accuracy_score(ov_int8_p, val_labels_in_img_order)
print(f"Accuracy of INT8 IR mannequin = {acc_score}n")

#acc_score = accuracy_score(tf_p, ov_fp32_p)
#print(f"TF Vs OV FP32 Accuracy Rating = {acc_score}")

#acc_score = accuracy_score(ov_fp32_p, ov_int8_p)
#print(f"OV FP32 Vs OV INT8 Accuracy Rating = {acc_score}")

if __name__ == "__main__":

parser = argparse.ArgumentParser(description="BiT Classification mannequin quantization and accuracy measurement")
non-obligatory = parser._action_groups.pop()
required=parser.add_argument_group("required arguments")
non-obligatory.add_argument("--inp_shape", sort=str, assist="N,W,H,C", default="1,224,224,3", required=False)
required.add_argument("--dataset_dir", sort=str, assist="Listing path to ImageNet2012 validation dataset", required=True)
required.add_argument("--gt_labels", sort=str, assist="Path to ImageNet2012 validation ds gt labels file", required=True)
required.add_argument("--bit_m_tf", sort=str, assist="Path to BiT TF fp32 mannequin file", required=True)
required.add_argument("--bit_ov_fp32", sort=str, assist="Path to BiT OpenVINO fp32 mannequin file", required=True)
non-obligatory.add_argument("--bit_ov_int8", sort=str, assist="Path to save lots of BiT OpenVINO INT8 mannequin file",
default="./bit_m_r50x1_1/ov/int8/saved_model.xml", required=False)
parser._action_groups.append(non-obligatory)

args = parser.parse_args()
bit_classification(args)

Related Articles

Latest Articles