AirBox Successfully Ports DeepSeek-R1 Models

18 February 2025 at 17:00

The Radxa Fogwise® AirBox has successfully ported the DeepSeek-R1-Distill-Qwen-7B/1.5B models.

Performance Details:

Deepseek-R1-Distill-Qwen-7B reaches 11 tokens/s

Deepseek-R1-Distill-Qwen-1.5B reaches 30 tokens/s

The Radxa development team has ported the DeepSeek-R1-Distill-Qwen-7B / 1.5B distilled models onto the Fogwise® AirBox. By using the TPU-MLIR toolchain for INT4 quantization and model compilation, We have successfully enabled the DeepSeek-R1 distilled model to run on the AirBox, which has 32 TOPS computational power.

Performance Results

DeepSeek-R1-Distill-Qwen-7B reaches 11 tokens/s, it is really an Edge Computing Monster, click to watch the video

Model	Quantization	Sequence Length	First Token Latency (s)	Tokens Per Second (tokens/s)
deepseek-r1-distill-qwen-1.5b	INT4	8192	5.159	30.448
deepseek-r1-distill-qwen-7b	INT4	2048	2.843	11.008

Model Deployment and Usage

The DeepSeek-R1-Distill-Qwen-7B/1.5B model porting method and detailed documentation have been released on Radxa official website. The models and code are fully open-source, and welcome everyone to try and deploy them.

Documentation Link

Fogwise® AirBox Overview

The Radxa Fogwise® AirBox is an embedded AI microserver with a computational power of up to 32TOPS. It supports various precisions (INT8, FP16/BF16, FP32) and local deployment of mainstream large models such as LLM, text-to-image generation, and various CV models. It features high performance, low power consumption, and strong environmental adaptability. With a variety of deep learning algorithms, it can achieve applications such as facial recognition, video structuring, behavior analysis, and status monitoring, empowering digital transformation in smart cities, smart transportation, smart energy, smart finance, smart telecom, and smart industries.

Additionally, the Radxa Fogwise® AirBox is fully compatible with edge large models such as ChatGLM3, Llama3.1, Qwen2.5, Stable Diffusion3, FLUX.1, MiniCPM-V2.6, CLIP, Whisper, and more. For more details, please refer to the Radxa official documentation, and feel free to experience it.

click to see details

Ultralytics Officially Announces Support for RKNN

Radxa

23 January 2025 at 17:00

Recently, Ultralytics officially announced its support for the RKNN platform. From now on, users of RK3588/356X series products can easily complete the model conversion and deployment of yolov11 by simply using the ultralytics library, pressing the "accelerate button" for the practical application of computer vision technology.

In this technological innovation, Radxa's star products, Radxa Rock 5B and Radxa Zero 3W, have stood out. As the core test platforms, they have provided a solid guarantee for the deployment and testing of the Ultralytics yolov11 model. Rock 5B is equipped with the high - performance Rockchip RK3588 processor, and Zero 3W is equipped with the powerful Rockchip RK3566 processor. With their excellent performance, stable performance, and strong compatibility, they have become the hardware cornerstone of technological breakthroughs.

YOLOv11 Inference on Board

RKNN Label on Board

For a long time, the complex processes of model conversion and deployment and hardware adaptation problems in the computer vision field have seriously restricted the promotion of technology. This official support of Ultralytics for the RKNN platform, combined with the successful tests based on Radxa products, has completely overcome this difficulty, making the implementation of technology more efficient.

RKNN Toolkit

The RKNN Toolkit, developed by Rockchip, was crucial in exporting the Ultralytics YOLO11 model to RKNN. This toolkit, a set of professional tools for deep - learning model deployment on Rockchip hardware, features the RKNN format. Optimized for Rockchip's NPU, RKNN unlocks full hardware acceleration on devices like RK3588 and RK3566, ensuring high - performance AI task execution.

Rockchip RKNN

The RKNN model offers many unique benefits. Its NPU - optimized design maximizes performance on Rockchip's NPU. Its low - latency trait suits real - time edge - device apps. Also, it can be customized for different Rockchip platforms, enhancing hardware resource use and overall efficiency.

For more details

For more details, see the Rockchip RKNN Export for Ultralytics YOLO11 Models and Radxa Docs.

How to Achieve Efficient Deployment of YOLO11 Model with Radxa Single Board Computers

Radxa

22 January 2025 at 17:00

Currently, the Ultralytics library officially supports the RKNN platform. Users of RK3588/356X products can directly use the ultralytics library for YOLOv11 model conversion and deployment., opening new opportunities for computer vision model applications on embedded devices.

Rockchip RKNN

Radxa's ROCK 5B runs on Rockchip's RK3588 processor; ZERO 3W, on RK3566. Their excellent performance and stability provide a solid hardware base for model export and verification.

The RKNN Toolkit, developed by Rockchip, was crucial in exporting the Ultralytics YOLO11 model to RKNN. This toolkit, a set of professional tools for deep - learning model deployment on Rockchip hardware, features the RKNN format. Optimized for Rockchip's NPU, RKNN unlocks full hardware acceleration on devices like RK3588 and RK3566, ensuring high - performance AI task execution.

The RKNN model offers many unique benefits. Its NPU - optimized design maximizes performance on Rockchip's NPU. Its low - latency trait suits real - time edge - device apps. Also, it can be customized for different Rockchip platforms, enhancing hardware resource use and overall efficiency.

Deploying Exported Ultralytics YOLO11 RKNN Models

For more details

For more details, see the Rockchip RKNN Export for Ultralytics YOLO11 Models

Enhancing ROCK 5B+ with DEEPX DX-M1 AI Module

Radxa

6 November 2024 at 17:00

All tests were conducted on the rock-5b-plus_bookworm_kde_b2.output.img.xz image.

The ROCK 5B+ is a precision single board computer (SBC) based on the RK3588 SoC with a 6 TOPS computing power NPU for a variety of AI applications. While 6 TOPS can handle a large number of AI vision tasks, application scenarios requiring higher computing power may require an upgrade. In this case, pairing the ROCK 5B+ with the DEEPX DX-M1 M.2 AI Accelerator Module adds a whopping 25 TOPS computing power, allowing the ROCK 5B+ to handle even more demanding AI workloads.

Fig.1 DX-M1 Product Overview

The DX-M1 module developed by DEEPX is connected to the ROCK 5B+ via the M.2 interface and the data communication is handled via the ROCK 5B+ PCIe. The module is optimized to accelerate inference tasks for models converted to dxnn format using the DXNN® - DEEPX NPU software (SDK).

Fig.2 DXNN SDK Architecture

DXNN® - DEEPX NPU software (SDK) includes a variety of tools: DX-COM (Model Conversion), DX-SIM (Model Simulation), DX-RT (Runtime), DX-NPU Driver (Device Driver) and DX-APP (Sample Code). With DXNN, deploying deep learning models on DEEPX AI hardware becomes efficient and easy, and leverages its high performance.

Hardware Installation

Insert the DX-M1 module into the M.2 slot of the ROCK 5B+ and power on. The ROCK 5B+ has two M.2 slots on the bottom, so even with the DX-M1 installed, another SSD can be installed if desired.

Fig.3 DX-M1 Installation Diagram

After booting the system, confirm PCIe device recognition.

Fig.4 ROCK 5B+ PCIe Detection Result

After installing the DX-NPU driver, the DX-M1 module should be correctly recognized on the ROCK 5B+.

Fig.5 DX-M1 Status Check

YOLOv5s DXNN Performance Evaluation

The DX-RT component facilitates inference for dxnn models. To evaluate YOLOv5s model performance on the DX-M1, we use the run_model benchmark tool.

Inference latency on ROCK 5B+ via the DX-M1 includes three stages: Latency = PCIe I/F (Write Time) + NPU (Inference Time) + PCIe I/F (Read Time).

Fig.6 DX-M1 Latency Analysis

# run benchmark
run_model -m YOLOV5S_3.dxnn -b -l 1000

Fig.7 YOLOv5s DXNN Benchmark Results

The average inference time is 4628.91 μs, i.e., 216 FPS, after 1000 inductions on the single-core NPU of the DX-M1. With three NPU cores, the theoretical maximum speed of the DX-M1 is 648 FPS, which is very close to the benchmark result of 645.476 FPS.

YOLOv5s 30 Channels Video Stream Detection

The DX-APP software package includes several computer vision demos that can be quickly deployed on the DX-M1 for tasks such as object detection and segmentation. In this example, Radxa performs object detection on 30 video streams simultaneously using the ROCK 5B+ and the DX-M1. The ROCK 5B+ decodes multiple video streams, sends the data to the DX-M1 for inference, and finally processes the output. It is worth to note that DX-APP recommends to use opencv 4.5.5, but since the FFmpeg version of the ROCK5B+ system is not compatible with Opencv 4.5.5, we compile the newest 4.10.0 version here.

# run multi-stream object detection
./bin/run_detector -c example/yolov5s3_example.json

Single-core NPU 30-channel video inference FPS: 240 FPS.

Fig.8 ROCK 5B+ Single-Core NPU 30-Channel Detection Output

Conclusion

Pairing the ROCK 5B+ with the DEEPX DX-M1 AI module is a significant enhancement for users requiring high-performance AI capabilities on a single-board computer. The addition of 25 TOPS of computing power opens new possibilities, allowing the ROCK 5B+ to efficiently handle demanding tasks, such as multi-stream object detection and high-speed inference. This combination showcases the potential of the ROCK 5B+ as a robust platform for AI workloads in edge computing, offering both flexibility and power. With tools like DXNN SDK and hardware support for intensive applications, the ROCK 5B+ and DX-M1 provide a valuable solution for developers and industries focused on AI and computer vision.

Reading view

Performance Results​

Model Deployment and Usage​

Fogwise® AirBox Overview​

RKNN Toolkit​

For more details​

Deploying Exported Ultralytics YOLO11 RKNN Models​

For more details​

Hardware Installation​

YOLOv5s DXNN Performance Evaluation​

YOLOv5s 30 Channels Video Stream Detection​

Conclusion​

Performance Results

Model Deployment and Usage

Fogwise® AirBox Overview

RKNN Toolkit

For more details

Deploying Exported Ultralytics YOLO11 RKNN Models

For more details

Hardware Installation

YOLOv5s DXNN Performance Evaluation

YOLOv5s 30 Channels Video Stream Detection

Conclusion