November 21, 2024
Perplexity is, historically speaking, one of the "standard" evaluation metrics for language models. And while…
Open Neural Network Exchange, or ONNX, is a free and open-source ecosystem for deep learning model representation. Facebook and Microsoft created this tool in 2017 to make it simpler for academics and engineers to migrate models between various deep-learning frameworks and hardware platforms.
One of ONNX’s key benefits is that it makes it simple to export models from one framework, like PyTorch, and import them into another framework, like TensorFlow. Engineers who need to deploy models on several hardware platforms or academics who wish to test out various frameworks for training and deploying their models may find this extremely helpful.
Now that we have briefly introduced ONNX let’s look at how it works and how the above benefits would apply through an example code.
In the example below, I will demonstrate how to create a simple neural network using PyTorch, convert it to ONNX format, and use ONNX Runtime for evaluation.
pip install torch torchvision onnx
The above code snippet will install the PyTorch framework, TorchVision (a library that provides datasets and models), and ONNX library using the Python pip package manager.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc1 = nn.Linear(28 * 28, 100)
self.fc2 = nn.Linear(100, 10)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.softmax(self.fc2(x), dim=1)
return x
model = SimpleModel()
Here, we create a simple, fully connected feed-forward neural network with a single hidden layer. In the __init__()
method, I have initialized two fully connected layers, where the first layer has an input size of 28*28 and 100 output units, and the second layer has 100 input units and 10 output units.
In the forward()
method contains the forward pass of the neural network. It accepts an input tensor x
and applies the ReLU activation function after passing through the first layer. Then, it uses the Softmax activation function after passing through the second layer and returns the x
tensor as the output.
import torch.onnx
dummy_input = torch.randn(1, 28*28)
onnx_filename = "simple_test_model.onnx"
torch.onnx.export(model, dummy_input, onnx_filename, verbose=True, input_names=['input'], output_names=['output'])
In this step, we import the torch.onnx
package for ONNX conversion. We create a dummy_input
, which is a random tensor. The input tensor is expected to be of size (1, 28*28), where 1 represents the batch size and 28*28 is the input dimension.
We then define the name of the ONNX model file as simple_test_model.onnx
. The torch.onnx.export()
function is used to convert the PyTorch model to ONNX format and save it in the file.
The
input_names
andoutput_names
arguments are optional but help identify the input and output tensors when using the ONNX model later.
pip install onnxruntime
This above command will install the ONNX runtime, ONNX Runtime is a high-performance, cross-platform library for running ONNX standard models on various devices and platforms or languages.
import onnxruntime as ort
import numpy as np
ORT_session = ort.InferenceSession(onnx_filename)
def run_model(input_data):
input_data = input_data.astype(np.float32)
input_name = ORT_session.get_inputs()[0].name
output_name = ORT_session.get_outputs()[0].name
result = ORT_session.run([output_name], {input_name: input_data})
return result[0]
# Define your input data, for example, a random tensor
input_data = np.random.randn(1, 28*28)
# Get output
output = run_model(input_data)
print(output)
I have created an inference session using the saved ONNX model using the file name. Then, I created a function that first casts the input data type to float32
because it is necessary for ONNX Runtime. Finally, we can run the model using the ORT_session.run()
and get the result.
By examining the above example, we can better understand some of the benefits ONNX provides.
It provides interoperability by supporting various deep-learning frameworks, allowing models to be converted from frameworks such as PyTorch and TensorFlow. In the above code, we converted the PyTorch model to ONNX format.
ONNX enables easy deployment for models across various platforms and programming languages. In the above code, we ran the ONNX model using the ONNX Runtime library, which is available for different platforms. In my personal experience, I have converted a TensorFlow model to an ONNX model and deployed it on a NodeJS server using the ONNX Runtime library.
ONNX Runtime is designed to provide optimized execution of models, both static and dynamic optimization. As shown in the code above, running inference with ONNX Runtime capitalizes on these optimizations to offer faster, low-latency inference than running the model directly in the training framework. This results in better resource utilization and efficient model deployments.
ONNX: An indispensable asset for AI developers that provides unmatched flexibility in selecting tools based on individual requirements while ensuring utmost compatibility, portability, and performance. Our article offers detailed instructions on developing a straightforward neural network utilizing PyTorch before assigning it an ONNX format that allows for inference using the ONNX runtime.