โŒ

Reading view

There are new articles available, click to refresh the page.

Ultralytics Officially Announces Support for RKNN

Recently, Ultralytics officially announced its support for the RKNN platform. From now on, users of RK3588/356X series products can easily complete the model conversion and deployment of yolov11 by simply using the ultralytics library, pressing the "accelerate button" for the practical application of computer vision technology.

In this technological innovation, Radxa's star products, Radxa Rock 5B and Radxa Zero 3W, have stood out. As the core test platforms, they have provided a solid guarantee for the deployment and testing of the Ultralytics yolov11 model. Rock 5B is equipped with the high - performance Rockchip RK3588 processor, and Zero 3W is equipped with the powerful Rockchip RK3566 processor. With their excellent performance, stable performance, and strong compatibility, they have become the hardware cornerstone of technological breakthroughs.

YOLOv11 Inference on Board

RKNN Label on Board

For a long time, the complex processes of model conversion and deployment and hardware adaptation problems in the computer vision field have seriously restricted the promotion of technology. This official support of Ultralytics for the RKNN platform, combined with the successful tests based on Radxa products, has completely overcome this difficulty, making the implementation of technology more efficient.

RKNN Toolkitโ€‹

The RKNN Toolkit, developed by Rockchip, was crucial in exporting the Ultralytics YOLO11 model to RKNN. This toolkit, a set of professional tools for deep - learning model deployment on Rockchip hardware, features the RKNN format. Optimized for Rockchip's NPU, RKNN unlocks full hardware acceleration on devices like RK3588 and RK3566, ensuring high - performance AI task execution.

Rockchip RKNN

The RKNN model offers many unique benefits. Its NPU - optimized design maximizes performance on Rockchip's NPU. Its low - latency trait suits real - time edge - device apps. Also, it can be customized for different Rockchip platforms, enhancing hardware resource use and overall efficiency.

For more detailsโ€‹

For more details, see the Rockchip RKNN Export for Ultralytics YOLO11 Models and Radxa Docs.

How to Achieve Efficient Deployment of YOLO11 Model with Radxa Single Board Computers

Currently, the Ultralytics library officially supports the RKNN platform. Users of RK3588/356X products can directly use the ultralytics library for YOLOv11 model conversion and deployment., opening new opportunities for computer vision model applications on embedded devices.

Rockchip RKNN

Radxa's ROCK 5B runs on Rockchip's RK3588 processor; ZERO 3W, on RK3566. Their excellent performance and stability provide a solid hardware base for model export and verification.

The RKNN Toolkit, developed by Rockchip, was crucial in exporting the Ultralytics YOLO11 model to RKNN. This toolkit, a set of professional tools for deep - learning model deployment on Rockchip hardware, features the RKNN format. Optimized for Rockchip's NPU, RKNN unlocks full hardware acceleration on devices like RK3588 and RK3566, ensuring high - performance AI task execution.

The RKNN model offers many unique benefits. Its NPU - optimized design maximizes performance on Rockchip's NPU. Its low - latency trait suits real - time edge - device apps. Also, it can be customized for different Rockchip platforms, enhancing hardware resource use and overall efficiency.

Deploying Exported Ultralytics YOLO11 RKNN Modelsโ€‹

For more detailsโ€‹

For more details, see the Rockchip RKNN Export for Ultralytics YOLO11 Models

Enhancing ROCK 5B+ with DEEPX DX-M1 AI Module

All tests were conducted on the rock-5b-plus_bookworm_kde_b2.output.img.xz image.

The ROCK 5B+ is a precision single board computer (SBC) based on the RK3588 SoC with a 6 TOPS computing power NPU for a variety of AI applications. While 6 TOPS can handle a large number of AI vision tasks, application scenarios requiring higher computing power may require an upgrade. In this case, pairing the ROCK 5B+ with the DEEPX DX-M1 M.2 AI Accelerator Module adds a whopping 25 TOPS computing power, allowing the ROCK 5B+ to handle even more demanding AI workloads.

Fig.1 DX-M1 Product Overview

The DX-M1 module developed by DEEPX is connected to the ROCK 5B+ via the M.2 interface and the data communication is handled via the ROCK 5B+ PCIe. The module is optimized to accelerate inference tasks for models converted to dxnn format using the DXNNยฎ - DEEPX NPU software (SDK).

Fig.2 DXNN SDK Architecture

DXNNยฎ - DEEPX NPU software (SDK) includes a variety of tools: DX-COM (Model Conversion), DX-SIM (Model Simulation), DX-RT (Runtime), DX-NPU Driver (Device Driver) and DX-APP (Sample Code). With DXNN, deploying deep learning models on DEEPX AI hardware becomes efficient and easy, and leverages its high performance.

Hardware Installationโ€‹

Insert the DX-M1 module into the M.2 slot of the ROCK 5B+ and power on. The ROCK 5B+ has two M.2 slots on the bottom, so even with the DX-M1 installed, another SSD can be installed if desired.

Fig.3 DX-M1 Installation Diagram

After booting the system, confirm PCIe device recognition.

Fig.4 ROCK 5B+ PCIe Detection Result

After installing the DX-NPU driver, the DX-M1 module should be correctly recognized on the ROCK 5B+.

Fig.5 DX-M1 Status Check

YOLOv5s DXNN Performance Evaluationโ€‹

The DX-RT component facilitates inference for dxnn models. To evaluate YOLOv5s model performance on the DX-M1, we use the run_model benchmark tool.

Inference latency on ROCK 5B+ via the DX-M1 includes three stages: Latency = PCIe I/F (Write Time) + NPU (Inference Time) + PCIe I/F (Read Time).

Fig.6 DX-M1 Latency Analysis

# run benchmark
run_model -m YOLOV5S_3.dxnn -b -l 1000

Fig.7 YOLOv5s DXNN Benchmark Results

The average inference time is 4628.91 ฮผs, i.e., 216 FPS, after 1000 inductions on the single-core NPU of the DX-M1. With three NPU cores, the theoretical maximum speed of the DX-M1 is 648 FPS, which is very close to the benchmark result of 645.476 FPS.

YOLOv5s 30 Channels Video Stream Detectionโ€‹

The DX-APP software package includes several computer vision demos that can be quickly deployed on the DX-M1 for tasks such as object detection and segmentation. In this example, Radxa performs object detection on 30 video streams simultaneously using the ROCK 5B+ and the DX-M1. The ROCK 5B+ decodes multiple video streams, sends the data to the DX-M1 for inference, and finally processes the output. It is worth to note that DX-APP recommends to use opencv 4.5.5, but since the FFmpeg version of the ROCK5B+ system is not compatible with Opencv 4.5.5, we compile the newest 4.10.0 version here.

# run multi-stream object detection
./bin/run_detector -c example/yolov5s3_example.json

Single-core NPU 30-channel video inference FPS: 240 FPS.

Fig.8 ROCK 5B+ Single-Core NPU 30-Channel Detection Output

Conclusionโ€‹

Pairing the ROCK 5B+ with the DEEPX DX-M1 AI module is a significant enhancement for users requiring high-performance AI capabilities on a single-board computer. The addition of 25 TOPS of computing power opens new possibilities, allowing the ROCK 5B+ to efficiently handle demanding tasks, such as multi-stream object detection and high-speed inference. This combination showcases the potential of the ROCK 5B+ as a robust platform for AI workloads in edge computing, offering both flexibility and power. With tools like DXNN SDK and hardware support for intensive applications, the ROCK 5B+ and DX-M1 provide a valuable solution for developers and industries focused on AI and computer vision.

โŒ