Enhancing ROCK 5B+ with DEEPX DX-M1 AI Module

Radxa

6 November 2024 at 17:00

All tests were conducted on the rock-5b-plus_bookworm_kde_b2.output.img.xz image.

The ROCK 5B+ is a precision single board computer (SBC) based on the RK3588 SoC with a 6 TOPS computing power NPU for a variety of AI applications. While 6 TOPS can handle a large number of AI vision tasks, application scenarios requiring higher computing power may require an upgrade. In this case, pairing the ROCK 5B+ with the DEEPX DX-M1 M.2 AI Accelerator Module adds a whopping 25 TOPS computing power, allowing the ROCK 5B+ to handle even more demanding AI workloads.

Fig.1 DX-M1 Product Overview

The DX-M1 module developed by DEEPX is connected to the ROCK 5B+ via the M.2 interface and the data communication is handled via the ROCK 5B+ PCIe. The module is optimized to accelerate inference tasks for models converted to dxnn format using the DXNN® - DEEPX NPU software (SDK).

Fig.2 DXNN SDK Architecture

DXNN® - DEEPX NPU software (SDK) includes a variety of tools: DX-COM (Model Conversion), DX-SIM (Model Simulation), DX-RT (Runtime), DX-NPU Driver (Device Driver) and DX-APP (Sample Code). With DXNN, deploying deep learning models on DEEPX AI hardware becomes efficient and easy, and leverages its high performance.

Hardware Installation

Insert the DX-M1 module into the M.2 slot of the ROCK 5B+ and power on. The ROCK 5B+ has two M.2 slots on the bottom, so even with the DX-M1 installed, another SSD can be installed if desired.

Fig.3 DX-M1 Installation Diagram

After booting the system, confirm PCIe device recognition.

Fig.4 ROCK 5B+ PCIe Detection Result

After installing the DX-NPU driver, the DX-M1 module should be correctly recognized on the ROCK 5B+.

Fig.5 DX-M1 Status Check

YOLOv5s DXNN Performance Evaluation

The DX-RT component facilitates inference for dxnn models. To evaluate YOLOv5s model performance on the DX-M1, we use the run_model benchmark tool.

Inference latency on ROCK 5B+ via the DX-M1 includes three stages: Latency = PCIe I/F (Write Time) + NPU (Inference Time) + PCIe I/F (Read Time).

Fig.6 DX-M1 Latency Analysis

# run benchmark
run_model -m YOLOV5S_3.dxnn -b -l 1000

Fig.7 YOLOv5s DXNN Benchmark Results

The average inference time is 4628.91 μs, i.e., 216 FPS, after 1000 inductions on the single-core NPU of the DX-M1. With three NPU cores, the theoretical maximum speed of the DX-M1 is 648 FPS, which is very close to the benchmark result of 645.476 FPS.

YOLOv5s 30 Channels Video Stream Detection

The DX-APP software package includes several computer vision demos that can be quickly deployed on the DX-M1 for tasks such as object detection and segmentation. In this example, Radxa performs object detection on 30 video streams simultaneously using the ROCK 5B+ and the DX-M1. The ROCK 5B+ decodes multiple video streams, sends the data to the DX-M1 for inference, and finally processes the output. It is worth to note that DX-APP recommends to use opencv 4.5.5, but since the FFmpeg version of the ROCK5B+ system is not compatible with Opencv 4.5.5, we compile the newest 4.10.0 version here.

# run multi-stream object detection
./bin/run_detector -c example/yolov5s3_example.json

Single-core NPU 30-channel video inference FPS: 240 FPS.

Fig.8 ROCK 5B+ Single-Core NPU 30-Channel Detection Output

Conclusion

Pairing the ROCK 5B+ with the DEEPX DX-M1 AI module is a significant enhancement for users requiring high-performance AI capabilities on a single-board computer. The addition of 25 TOPS of computing power opens new possibilities, allowing the ROCK 5B+ to efficiently handle demanding tasks, such as multi-stream object detection and high-speed inference. This combination showcases the potential of the ROCK 5B+ as a robust platform for AI workloads in edge computing, offering both flexibility and power. With tools like DXNN SDK and hardware support for intensive applications, the ROCK 5B+ and DX-M1 provide a valuable solution for developers and industries focused on AI and computer vision.

Normal view

Hardware Installation​

YOLOv5s DXNN Performance Evaluation​

YOLOv5s 30 Channels Video Stream Detection​

Conclusion​

Hardware Installation

YOLOv5s DXNN Performance Evaluation

YOLOv5s 30 Channels Video Stream Detection

Conclusion