Check out our latest blog article: From component to enterprise – modular robotics done right.
Organizations developing AI for edge devices often need flexibility across hardware platforms. This page walks through how NVIDIA-trained AI models can be ported to Qualcomm chipsets using ONNX and the Qualcomm AI Hub. You’ll learn the goals, implementation strategy, tools, and real-world business impact of this porting process.
Many edge devices – especially in industrial, automotive, and medical – remain on one chipset for stability and compliance. Still, teams increasingly want a portable path to hedge supply and cost risk.
When working with a partner like Qualcomm, our role is to make those choices viable by enabling cross-ecosystem deployment. In this specific instance, we collaborated to investigate and execute the porting of models originally trained on NVIDIA hardware to the Qualcomm chipset ecosystem.
Before starting the porting process, we defined clear objectives to measure technical feasibility, required resources, and expected performance benchmarks.
Our approach focused on proving a repeatable, chipset-agnostic workflow that could move models from NVIDIA to Qualcomm platforms efficiently.
We selected a representative path: take a model family used alongside NVIDIA Isaac GR00T (general-purpose robot foundation models) and retarget it from a CUDA/TensorRT workflow to Qualcomm using an ONNX-based toolchain. While the overall approach is chipset-agnostic, we used the Qualcomm iQ8/iQ9 families — with QCS6490 as the concrete target—for validation. The plan was to demonstrate a repeatable, documentation-backed workflow that generalizes to Orin/TX-originated models and other Qualcomm targets.
Successful porting required leveraging both NVIDIA’s development environment and Qualcomm’s AI Hub. The following stack formed the foundation of the workflow.
This stack let us keep training/experimentation on the original platform and streamline deployment on the Qualcomm edge SoC.

To make the process transparent, we documented each stage of the porting pipeline from setup to output visualization. Below is the step-by-step workflow. The porting of the YOLOv8 AI model from NVIDIA's platform to Qualcomm's QCS6490 chipset was executed using the Qualcomm AI Hub, an ONNX-compatible environment. Below is a detailed breakdown of the process, including installation guidance and key stages from model compilation to inference and output processing.
The Qualcomm AI Hub can be set up by following the official installation guide available here.
Script: 1_onnx_qnn_compile.py
The provided code is a Python script that performs several operations related to loading, checking, quantizing, and compiling an ONNX model using the qai_hub library.
The script checks the integrity of the loaded ONNX model using onnx.checker.check_model with full_check=True to ensure the model is valid
Next, a device object is created using qai_hub to specify the target device for quantization and compilation. In this case, the device is "QCS6490 (Proxy)"
The script prepares calibration data for quantization, which consists of a randomly generated image tensor with the shape (1, 3, 640, 640)
The quantized ONNX model is retrieved from the quantization job and asserted to be an instance of hub.Model
Finally, the script submits a compilation job to compile the quantized model to a QNN format. The compilation options specify the target runtime and quantization of input/output. The job's result is asserted to be an instance of hub.CompileJob.
Script: 2_profile_model.py
The provided code is a Python script that interacts with the qai_hub library to profile a machine learning model.
The script retrieves a previously created profiling job using the job ID "jp0494l2g". This job ID is used to fetch the job details, including the model associated with the profiling job. The model is then stored in the variable hub_model.
Next, the script submits a new profiling job using the retrieved model. It specifies the target device for profiling as "QCS6490 (Proxy)". The hub.submit_profile_job function is used to submit this job, and the result is stored in the variable new_profile_job.
Script: 3_infer.py
The provided code is a Python script that performs inference on an image using a model from the qai_hub library. The script starts by importing necessary libraries, including numpy, PIL.Image, and qai_hub.
The script then loads an image named cap.jpg, resizes it to 640x640 pixels, and converts it to a NumPy array with a data type of float32.
To prepare the image for inference, it ensures the correct layout (NCHW) and rescales the pixel values to the range [0, 1]. This is done by transposing the array and expanding its dimensions.
An inference job is submitted using the retrieved model and the prepared input image. The hub.submit_inference_job function is used to submit this job, and the result is asserted to be an instance of hub.InferenceJob.
The script downloads the output data from the inference job and stores it in a dictionary. This dictionary contains the on-device output.
Finally, the script saves the result dictionary to a .npy file named inference_cap_yolo11s.npy.
Script: 4_process_output.py
The script processes the output of a YOLO model to extract and filter bounding boxes and class scores, applies NMS to remove redundant detections, and visualizes the final detections on the original image.
The compute_iou function calculates the Intersection over Union (IoU) between a given box and a set of boxes. This is used to measure the overlap between bounding boxes.
The nms_process function applies Non-Maximum Suppression (NMS) to filter out overlapping bounding boxes based on their IoU and confidence scores. It keeps the boxes with the highest scores and removes others that overlap significantly.
The script extracts bounding box predictions and class scores from the model output. It converts the bounding boxes from xywh to xyxy format and computes the maximum class scores and corresponding class IDs for each detection.
NMS is applied to the remaining detections to remove redundant bounding boxes.
The original image is loaded, and the bounding boxes are rescaled to match the original image dimensions. The script then draws the bounding boxes and class labels on the image.
Evaluation

Total: 21 hours of ML engineer time to stand up the pipeline, validate on device, and produce artifacts.
Porting AI models from NVIDIA to Qualcomm chipsets enables businesses to reduce vendor lock-in, optimize costs, and accelerate deployment. If you’re evaluating Qualcomm AI Hub or want to explore cross-chipset model deployment, our team can help guide the process.