Normal view

There are new articles available, click to refresh the page.
Yesterday — 18 February 2025Main stream

Vaaman reconfigurable edge computer features Rockchip RK3399 SoC and Efinix Trion T120 FPGA (Crowdfunding)

18 February 2025 at 20:30
Vaaman reconfigurable edge computer

Vaaman is a reconfigurable single-board edge computer that integrates a Rockchip RK3399 hexa-core ARM processor with an Efinix Trion T120 FPGA, offering a reconfigurable platform for edge computing applications. The board combines the flexibility of an FPGA with the raw power of a hard processor to create a system capable of adapting to varying computational demands in real time. The compact SBC features the Rockchip RK3399 hexa-core processor with two Cortex-A72 cores and four Cortex-A53 cores, as well as an Efinix Trion T120 FPGA with 112,128 logic elements, interlinked with RK3399 via a high-speed 300Mbps bridge (but it’s unclear how this is implemented). It is billed as a “Raspberry Pi-style board for the FPGA world” that can be used for cryptographic acceleration, software-defined radio (SDR), digital signal processing, real-time robotics, real-time video processing, edge AI deployments, industrial automation, and hardware prototyping. It features a 40-pin Raspberry Pi-compatible GPIO header and [...]

The post Vaaman reconfigurable edge computer features Rockchip RK3399 SoC and Efinix Trion T120 FPGA (Crowdfunding) appeared first on CNX Software - Embedded Systems News.

Before yesterdayMain stream

Waveshare ESP32 robotic arm kit with 5+1 DoF supports ROS2, LeRobot, and Jetson Orin NX integration

17 February 2025 at 00:01
RoArm M3 Pro and RoArm M3 S High Torque Serial Bus Servo Robotic Arm Kit

Waveshare has recently released the RoArm-M3-Pro and RoArm-M3-S, a 5+1 DOF high-torque ESP32 robotic arm kit. The main difference between the two is that the RoArm-M3-Pro features all-metal ST3235 bus servos for durability and longevity, on the other hand, the RoArm-M3-S uses standard servo motors which are less durable for long-term use. These robotic arms feature a lightweight structure, a 360° omnidirectional base, and five flexible joints, which together create a 1m workspace with a 200 grams @ 0.5m payload. A 2-DOF wrist joint enables multi-dimensional clamping and precise force control. It integrates an ESP32 MCU, supporting multiple wireless control modes via a web app, it also supports inverse kinematics for accurate positioning, curve velocity control for smooth motion, and adaptive force control. The design is open source and with ROS2 compatibility, it allows secondary development via JSON commands and ESP-NOW for multi-device communication. Compatible with the LeRobot AI framework, [...]

The post Waveshare ESP32 robotic arm kit with 5+1 DoF supports ROS2, LeRobot, and Jetson Orin NX integration appeared first on CNX Software - Embedded Systems News.

Unleashing the Power of Gemini 2.0: Why Enterprises and Developers Should Take Note

By: sandeep
17 February 2025 at 15:58

The rapid evolution of Artificial Intelligence (AI) is reshaping industries and changing the way businesses operate. The demand for robust, versatile models to handle complex tasks has skyrocketed as AI advances. Enterprises and developers alike are searching for cutting-edge solutions, and Gemini 2.0 is here to meet those demands.

Gemini 2.0, a next-generation large language model (LLM), sets a new benchmark in AI capabilities. With advanced understanding, precision, and flexibility, it empowers organizations to scale AI applications across industries like healthcare, finance, and beyond.

In this blog post, we will explore why Gemini 2.0 is the go-to choice for enterprises looking to harness AI’s true potential. We’ll dive into its key benefits, use cases that add value for businesses, and why developers should integrate Gemini 2.0 into their AI-driven projects.

What is Gemini 2.0?

Gemini 2.0 is a next-generation LLM developed to push the boundaries of AI in natural language understanding, generation, and multimodal processing. As the successor to previous models, it provides enhanced performance, greater efficiency, and more versatile capabilities to meet the growing needs of enterprises and developers.

Key Features and Improvements:

  • Superior Performance: Gemini 2.0 delivers unmatched accuracy and faster processing speeds, making it ideal for large-scale applications. Whether handling complex queries, generating content, or making decisions based on real-time data, it outperforms previous models.
  • Multimodal Capabilities: One of Gemini 2.0’s standout features is its ability to process and integrate both text and image inputs. This capability enables tasks like image captioning, text-to-image generation, and cross-modal search.
  • Improved Efficiency: Focused on optimization, Gemini 2.0 reduces the computational cost of running large models, enabling enterprises to scale AI applications without prohibitive costs.
  • Family of Models: Gemini 2.0 is part of a family of models, each designed for specific use cases and industries, from customer service automation to data analysis and creative content generation.

What Makes Gemini 2.0 Stand Out?

What sets Gemini 2.0 apart from other LLMs is its combination of superior performance and multimodal capabilities. Unlike many LLMs that excel in natural language processing, Gemini 2.0 handles multiple data types, offering enterprises a more versatile AI solution.

Additionally, Gemini 2.0 is cost-effective, making it an accessible choice for businesses looking to integrate powerful AI models without breaking the bank. Its balance between performance and efficiency gives enterprises a competitive edge in AI-driven innovations, all while avoiding the steep costs of other solutions.

In essence, Gemini 2.0 is more than just a language model; it’s a game-changer in AI, offering a holistic, efficient, and adaptable solution for enterprises and developers aiming to stay ahead in an ever-evolving technological landscape.

Benefits for Enterprises

Gemini 2.0 offers a broad spectrum of practical benefits for enterprises seeking to leverage AI for business growth and operational optimization. Here are some of the key advantages for businesses:

1. Boosting Efficiency and Productivity

Gemini 2.0 can enhance operational efficiency by automating routine tasks and optimizing workflows. Here’s how:

  • Automating Customer Service Interactions (Chatbots): With Gemini 2.0, enterprises can deploy intelligent chatbots that handle a wide variety of customer service queries, improving response time and customer satisfaction. These chatbots can engage with customers 24/7, resolving everything from basic FAQs to more complex issues.
  • Generating Reports and Summaries from Large Datasets: Gemini 2.0 excels in processing large datasets and summarizing key insights quickly. Whether it’s research reports, sales data, or financial documents, businesses can automate the extraction of key information, allowing employees to focus on strategy instead of spending hours on data compilation.
  • Assisting with Content Creation and Marketing: Content-driven businesses can use Gemini 2.0 to streamline content generation. From blog posts to social media updates, Gemini 2.0’s natural language generation capabilities help businesses maintain consistent, high-quality output with minimal human intervention.

2. Enhanced Decision-Making

Making data-driven decisions is crucial in today’s business environment. Gemini 2.0 empowers businesses to make more informed decisions:

  • Analyzing Market Trends and Customer Behavior: Gemini 2.0 processes complex datasets to offer valuable insights into consumer behavior, market conditions, and trends. This helps businesses stay ahead of demand shifts and adjust strategies proactively.
  • Predicting Potential Risks and Opportunities: By analyzing past data and patterns, Gemini 2.0 predicts future risks and identifies opportunities, allowing businesses to mitigate potential losses and seize new market prospects early on.

3. Driving Innovation and Product Development

Beyond optimizing existing operations, Gemini 2.0 fosters innovation:

  • Generating New Ideas and Supporting Research: Enterprises can use Gemini 2.0 to generate ideas based on existing research, sparking new product features, marketing campaigns, or solutions for complex challenges.
  • Accelerating New Product Development: Gemini 2.0 accelerates the product development process by quickly analyzing market needs, refining product fit, and helping businesses design products that truly resonate with their audience.

4. Cost Savings

Automation and enhanced efficiency lead to significant cost savings:

  • Reduced Operational Costs: Automating repetitive tasks such as customer support, data entry, and content generation allows businesses to reduce labor costs, freeing up resources for more strategic, high-value initiatives.
  • Optimized Resource Allocation: By leveraging data analysis, Gemini 2.0 helps businesses optimize resource allocation, ensuring efforts are focused on the most impactful tasks for the business.

5. Gaining a Competitive Edge

Adopting Gemini 2.0 can provide businesses with a competitive advantage:

  • Staying Ahead of the Curve: Gemini 2.0’s ability to process vast datasets quickly and provide valuable insights ensures businesses can innovate faster, keeping them ahead of competitors relying on slower, traditional methods.
  • Agility in a Changing Market: Gemini 2.0’s rapid adaptability allows businesses to adjust quickly to market changes, ensuring they stay competitive and maintain their leadership position in the industry.

6. OCR vs. Gemini VLM 

Enterprises often decide between traditional Optical Character Recognition (OCR) tools and advanced Vision Language Models (VLM) like Gemini 2.0 for text extraction and analysis. Here’s how Gemini 2.0 shines:

  • OCR: While OCR effectively converts scanned documents into editable text, it struggles with complex layouts, handwriting, or documents containing mixed media. Furthermore, OCR can become costly when scaling to process large volumes of data from diverse document types.
  • Gemini 2.0’s VLM: Gemini 2.0 Flash offers more versatile and cost-effective capabilities by processing 6000-page PDF content for just $1, achieving near-perfect results. It can process not only text but also images, integrating multiple data types (text and images) into one seamless framework. This eliminates the need for several specialized tools, improving accuracy and reducing costs. It streamlines workflows, automates data entry, and provides insights for better decision-making.

In summary, Gemini 2.0 Flash is a promising alternative to traditional OCR with a multimodal, powerful AI solution, providing enterprises with a more efficient and cost-effective way to process data and automate tasks, all while enhancing the accuracy of document processing.

7. Deep Research

Google’s Gemini AI assistant now features Deep Research, an AI-driven tool that conducts comprehensive research on your behalf, delivering detailed reports with key findings and source links. This enhancement aims to streamline information gathering, making it more efficient and user-friendly. 

Benefits for Developers

Gemini 2.0 provides developers with powerful tools designed to simplify the creation of AI-powered applications. Whether building prototypes, integrating systems, or leveraging cutting-edge capabilities, Gemini 2.0 makes it easier to innovate and create more efficiently.

1. Simplified Development

Gemini 2.0 offers easy integration and a streamlined development process. With its well-documented API and a comprehensive set of developer tools, developers can quickly harness the power of AI. Whether you’re a beginner or an experienced developer, Gemini 2.0’s intuitive interface and pre-built modules enable rapid development.

  • Easy-to-use API: Interact with AI models using clean, simple code—no deep expertise required.
  • Pre-built Modules: Leverage ready-made functionalities for text processing, image analysis, and more, reducing development time significantly.

2. Faster Prototyping and Iteration

Speed is critical in AI development, and Gemini 2.0 allows for quick prototyping and testing. With access to powerful models and real-time feedback, developers can experiment and iterate rapidly.

  • Quick Prototyping: Test and fine-tune new AI models within hours, not weeks.
  • Real-time Feedback: Assess model performance with real-world data and adjust almost immediately.

3. Access to Advanced AI Capabilities

Gemini 2.0 gives developers access to advanced AI features that are challenging to implement independently, such as:

  • Natural Language Understanding (NLU): Use Gemini 2.0 to process and generate human-like text for chatbots, customer support tools, or content creation.
  • Natural Language Generation (NLG): Automate content generation for blogs, reports, and social media with context-aware models.
  • Multimodal Capabilities: Process both text and image inputs, enabling complex applications like image captioning and visual question answering.

These features allow developers to build innovative AI applications across industries.

4. Seamless Integration with Existing Systems

Gemini 2.0 integrates easily with existing software ecosystems, whether for cloud platforms, databases, or third-party solutions.

  • SDKs and APIs: Gemini 2.0 provides robust SDKs and APIs for easy integration without disrupting existing infrastructure. Enterprises that have already integrated OpenAI into their workflow can easily switch to the Gemini model as it supports cross-compatible API endpoints with OpenAI SDK.
  • Cross-platform Compatibility: It integrates smoothly with platforms like AWS, Azure, and Google Cloud, allowing developers to leverage both cloud computing and AI capabilities.

5. Customization and Fine-tuning

Gemini 2.0 offers unmatched flexibility, allowing developers to fine-tune models for specific use cases and domains, ensuring AI solutions meet business needs.

  • Domain-Specific Tuning: Customize Gemini 2.0 for industries like healthcare, finance, and e-commerce to better suit specialized data and workflows.
  • Custom Model Development: Adjust parameters and build tailored solutions, whether improving NLP tasks or integrating new data sources.

These customization features enable developers to create scalable, specialized AI solutions.

Use Cases for Gemini 2.0

Gemini 2.0 is a powerful, versatile AI solution with wide-ranging applications across industries. Its multimodal capabilities and advanced features allow businesses to enhance efficiency, drive innovation, and make smarter decisions. Below are key industry-specific and cross-industry use cases.

Industry-Specific Use Cases:

  • Healthcare: Assists in AI-powered diagnostics (analyzing medical images and patient records) and personalized treatment plans based on genetic data.
  • Finance: Detects fraud in real time and assesses credit risk by analyzing financial data, market trends, and unstructured text.
  • Education: Enables personalized learning with tailored content and automates grading with contextual feedback.
  • Retail: Provides personalized product recommendations and optimizes inventory by forecasting demand.

Cross-Industry Use Cases:

  • Content & Marketing: Automates content generation and SEO optimization to improve visibility.
  • Customer Support: Powers intelligent chatbots and sentiment analysis for real-time feedback.
  • Business Intelligence: Delivers predictive analytics and data visualization for informed decision-making.
  • Software Development: Automates code generation, suggests optimizations, and detects bugs for efficient development.

Google AI Studio

Google AI Studio is a powerful platform that equips developers and individuals with cutting-edge AI tools to boost productivity and creativity. Whether you’re building AI-driven applications, analyzing videos, or testing machine learning models, Google AI Studio offers a wide range of features to streamline your workflow. One of its standout features is the unlimited free chat for coding and logical thinking, enabling developers to prototype and test ideas quickly without premium service costs.

Available Models in Google AI Studio as of Feb 2025

  • Gemini 2.0 Flash-Lite Preview  
  • Gemini 2.0 Flash
  • Gemini 2.0 Pro Experimental
  • Gemini 2.0 Flash Exp Thinking
  • Gemma 2B, 9B, 27B
  • LearnLM Pro 1.5
  • Gemini 1.5 Family

Gemini 2.0 Flash is a high-performance multimodal model capable of processing and generating text, images, audio, and video. Gemini 2.0 Pro offers enhanced capabilities for complex tasks, and Flash Thinking focuses on reasoning before generating responses.

Google AI Studio provides a variety of pre-trained models for various tasks, which developers can use directly or fine-tune for specific needs. Some notable models include:

  • Video Analyzer: This model automates video content analysis, extracting key insights for tasks like content moderation, facial recognition, and object detection. It reduces the time spent on manual video analysis by automating complex processes.
  • Screen sharing: The screen sharing feature in Gemini allows real-time screen sharing during live interactions, enhancing collaboration with seamless integration of text, audio, and video inputs while prioritizing user privacy and data management.
  • Grounding: The Grounding model enables developers to associate text with specific objects in images, facilitating the creation of AI applications like image captioning, object localization, and more.
  • Code Execution: Google AI Studio’s built-in code execution feature allows developers to write, test, and execute code directly on the platform, eliminating the need for external environments and simplifying debugging.

Free Model Tuning – A Hidden Gem

A unique feature of Google AI Studio is its free model-tuning capability. Unlike other platforms where model fine-tuning comes with a cost, Google AI Studio allows users to adjust and steer model responses to suit specific needs at no extra cost. This allows developers and businesses to tailor AI models to their use cases without significant investment.

  • Tailor Responses: Fine-tune models for tasks like chatbots, content generation, or natural language processing to meet your requirements.
  • Cost-Effective Customization: This feature enables developers to customize models at no extra cost, providing flexibility and refinement without paying for expensive paid versions.

Limitations of the Free Tier

While Google AI Studio’s free tier offers robust functionality, there are a few limitations to keep in mind:

  • Limited API Calls: The free tier has a cap on the number of API calls so that heavy usage may require an upgrade to a paid plan for increased volume.
  • Limited Access to Premium Features: Advanced features, including access to higher compute resources and more powerful models, are available only to premium users. However, the free tier still provides access to most core functionalities, making it an excellent developer starting point.

How to get started with AI Studio?

– Visit aistudio.google.com/ and sign up for the first time.
– Access all the latest Gemini models in a chat interface.

  • To access Gemini via API, generate a free API key with Generative Language Client Project.

As AI continues to evolve and impact industries across the globe, leveraging advanced tools like Gemini 2.0 and Google AI Studio is essential for enterprises and developers who want to stay ahead of the curve. These platforms provide cutting-edge capabilities, enabling businesses to drive innovation, enhance productivity, and gain valuable insights. By integrating multimodal AI models, automating workflows, and fine-tuning models to suit specific needs, companies, and developers can create more efficient, cost-effective solutions.

For those ready to take the next step in AI, OpenCV University offers a comprehensive range of free AI courses where you can learn key concepts in deep learning, machine learning, and computer vision. These courses will help you build a strong foundation in AI and give you the skills to apply cutting-edge technologies like Gemini 2.0 and Google AI Studio to real-world challenges.

Start your AI journey today with OpenCV’s free courses!

The post Unleashing the Power of Gemini 2.0: Why Enterprises and Developers Should Take Note appeared first on OpenCV.

This Week in Beagle #15

17 February 2025 at 13:00

Hello everyone. Another typical week. Let’s go over everything.

BeagleBoard Image Builder

After much deliberation last week, I decided to go ahead with trying to use debos for image generation. I was able to create debos recipes for BeagleY-AI and PocketBeagle 2. However, the image generated by debos does not seem to boot on either board. More specifically, it does not seem like the soc is able to find the bootloader (u-boot) present in the BOOT partition.

I did cross-check everything with the current images, and the partitions seem to be in order. So not sure what is going on here. I also went through the debos source code to see how it was creating the partitions, but recreating it in normal shell scripts does seem to work as expected. I also tried generating empty images using debos and copying the boot partition to the appropriate partition, but still could not get it to work.

I have created an issue regarding this, with some scripts that generate the image properly. Let’s see if it can be resolved soon.

PocketBeagle 2 Examples

Some work also went into cleaning up the current examples for PocketBeagle 2. I will be getting my PocketBeagle 2 and TechLab Cape delivered by the end of the week, so hopefully, the examples will be greatly expanded in the upcoming weeks.

Refactor Dependencies

The examples were pulling some unnecessary dependencies, so I removed most of them. Now, there are only 6 dependencies, which is always nice.

I have switched to gpiod for GPIO handling. This is because I went through the source code of both gpiod and gpio-cdev, and felt that gpiod was superior. Additionally, gpiod supports both v1 and v2 of GPIO Character Device Userspace API, while gpio-cdev only supports v1.

I can always write any shim that might be required for interop with embedded_hal myself.

Add blinky ioctl example

I have also added a blinky example, that directly uses GPIO Character Device Userspace API instead of gpiod library. This should help anyone curious peek behind the curtain and get an idea of how the GPIO libraries work behind the scenes.

I am also thinking of creating a blinky example that does not have any dependencies, although I am a bit unsure if it should be another example, or just do that in this example.

Ending Thoughts

This was it for this week. Hopefully, this helps bring transparency regarding where the development efforts are concentrated, and how the community can help. Look forward to next update.

Helpful links

The post This Week in Beagle #15 appeared first on BeagleBoard.

Boardcon SBC3576 – A feature-rich Rockchip RK3576 SBC with HDMI, mini DP, dual GbE, WiFi 6, optional 5G/4G LTE module, and more

13 February 2025 at 15:04
Boardcon SBC3576 SBC

Boardcon SBC3576 is a feature-rich single board computer (SBC) based on the MINI3576 system-on-module powered by a Rockchip RK3576 AI SoC and equipped with two 100-pin and one 44-pin board-to-board connectors for interfacing with the carrier board. The carrier board is equipped with up to 8GB RAM, 128GB eMMC flash, two gigabit Ethernet ports, a WiFi 6 and Bluetooth 5.3 module, 4K-capable HDMI 2.1 and mini DP video outputs, a mini HDMI input port, a USB 3.0 Type-A port, RS485 and CAN Bus terminal block, and more. The Rockchip RK3576 SoC comes with the same 6 TOPS NPU found in the Rockchip RK3588/RK3588S SoC and can be used as a lower-cost alternative with less performance. Boardcon SBC3576 specifications: SoC – Rockchip RK3576 CPU 4x Cortex-A72 cores at 2.2GHz, four Cortex-A53 cores at 1.8GHz Arm Cortex-M0 MCU at 400MHz GPU – ARM Mali-G52 MC3 GPU clocked at 1GHz with support for OpenGL [...]

The post Boardcon SBC3576 – A feature-rich Rockchip RK3576 SBC with HDMI, mini DP, dual GbE, WiFi 6, optional 5G/4G LTE module, and more appeared first on CNX Software - Embedded Systems News.

Edge video processing platform features NXP i.MX 8M Plus, i.MX 93, or i.MX 95 SoC, supports up to 23 camera types

13 February 2025 at 10:42
DAB Embedded AquaEdge video processing platform

DAB Embedded AquaEdge is a compact computer based on NXP i.MX 8M Plus, i.MX 93, or i.MX 95 SoC working as an edge video processing platform and supporting 23 types of vision cameras with resolution from VGA up to 12MP, and global/rolling shutter. The small edge computer features a gigabit Ethernet RJ45 jack with PoE to power the device. It is also equipped with a single GSML2 connector to connect a camera whose input can be processed by the built-in AI accelerator found in the selected NXP i.MX processors. Other external ports include a microSD card slot, a USB 3.0 Type-A port, and a mini HDMI port (for the NXP i.MX 8M Plus model only). DAB Embedded AquaEdge specifications: SoC / Memory / Storage options NXP i.MX 8M Plus CPU – Quad-core Cortex-A53 processor @ 1.8GH, Arm Cortex-M7 real-time core AI accelerator – 2.3 TOPS NPU VPU Encoder up to [...]

The post Edge video processing platform features NXP i.MX 8M Plus, i.MX 93, or i.MX 95 SoC, supports up to 23 camera types appeared first on CNX Software - Embedded Systems News.

DeepSeek shown to run on Rockchip RK3588 with AI acceleration at about 15 tokens/s

9 February 2025 at 11:51
Rockchip RK3588 DeepSeek R1 NPU acceleration

DeepSeek R1 model was released a few weeks ago and Brian Roemmele claimed to run it locally on a Raspberry Pi at 200 tokens per second promising to release a Raspberry Pi image “as soon as all tests are complete”. He further explains the Raspberry Pi 5 had a few HATs including a Hailo AI accelerator, but that’s about all the information we have so far, and I assume he used the distilled model with 1.5 billion parameters. Jeff Geerling did his own tests with DeepSeek-R1 (Qwen 14B), but that was only on the CPU at 1.4 token/s,  and he later installed an AMD W7700 graphics card on it for better performance. Other people made TinyZero models based on DeepSeekR1 optimized for Raspberry Pi, but that’s specific to countdown and multiplication tasks and still runs on the CPU only. So I was happy to finally see Radxa release instructions to [...]

The post DeepSeek shown to run on Rockchip RK3588 with AI acceleration at about 15 tokens/s appeared first on CNX Software - Embedded Systems News.

YOLO-Jevois leverages YOLO-World to enable open-vocabulary object detection at runtime, no dataset or training needed

6 February 2025 at 10:29
YOLO-Jevois general object detection by typing words

YOLO is one of the most popular edge AI computer vision models that detects multiple objects and works out of the box for the objects for which it has been trained on. But adding another object would typically involve a lot of work as you’d need to collect a dataset, manually annotate the objects you want to detect, train the network, and then possibly quantize it for edge deployment on an AI accelerator. This is basically true for all computer vision models, and we’ve already seen Edge Impulse facilitate the annotation process using GPT-4o and NVIDIA TAO to train TinyML models for microcontrollers. However, researchers at jevois.org have managed to do something even more impressive with YOLO-Jevois “open-vocabulary object detection”, based on Tencent AI Lab’s YOLO-World, to add new objects in YOLO at runtime by simply typing words or selecting part of the image. It also updates class definitions on [...]

The post YOLO-Jevois leverages YOLO-World to enable open-vocabulary object detection at runtime, no dataset or training needed appeared first on CNX Software - Embedded Systems News.

Introducing the Beta Launch of Docker’s AI Agent, Transforming Development Experiences

6 February 2025 at 04:36

For years, Docker has been an essential partner for developers, empowering everyone from small startups to the world’s largest enterprises. Today, AI is transforming organizations across industries, creating opportunities for those who embrace it to gain a competitive edge. Yet, for many teams, the question of where to start and how to effectively integrate AI into daily workflows remains a challenge. True to its developer-first philosophy, Docker is here to bridge that gap.

We’re thrilled to introduce the beta launch of Docker AI Agent (also known as Project: Gordon)—an embedded, context-aware assistant seamlessly integrated into the Docker suite. Available within Docker Desktop and CLI, this innovative agent delivers tailored guidance for tasks like building and running containers, authoring Dockerfiles and Docker-specific troubleshooting—eliminating disruptive context-switching. By addressing challenges precisely when and where developers encounter them, Docker AI Agent ensures a smoother, more productive workflow.

As the AI Agent evolves, enterprise teams will unlock even greater capabilities, including customizable features that streamline collaboration, enhance security, and help developers work smarter. With the Docker AI Agent, we’re making Docker even easier and more effective to use than it has ever been — AI accessible, actionable, and indispensable for developers everywhere.

How Docker’s AI Agent Simplifies Development Challenges  

Developing in today’s fast-paced tech landscape is increasingly complex, with developers having to learn an ever growing number of tools, libraries and technologies.

By integrating a GenAI Agent into Docker’s ecosystem, we aim to provide developers with a powerful assistant that can help them navigate these complexities. 

The Docker AI Agent helps developers accelerate their work, providing real-time assistance, actionable suggestions, and automations that remove many of the manual tasks associated with containerized application development. Delivering the most helpful, expert-level guidance on Docker-related questions and technologies, Gordon serves as a powerful support system for developers, meeting them exactly where they are in their workflow. 

If you’re a developer who favors graphical interfaces, Docker Desktop AI UI will help you navigate container running issues, image size management and more generic Dockerfile oriented questions. If you’re a command line interface user, you can call, and share context with the agent directly in your favorite terminal.

So what can Docker’s AI Agent do today? 

We’re delivering an expert assistant for every Docker-related concept and technology, whether it’s getting started, optimizing an existing Dockerfile or Compose file, or understanding Docker technologies in general. With Docker AI Agent, you also have the ability to delegate actions while maintaining full control and review over the process.

A first example, if you want to run a container from an image, our agent can suggest the most appropriate docker run command tailored to your needs. This eliminates the guesswork or the need to search Docker Hub, saving you time and effort. The result combines a custom prompt, live data from Docker Hub, Docker container expertise and private usage insights, unique to Docker Inc.

blog DD Gordon Chat Light

We’ve intentionally designed the output to be concise and actionable, avoiding the overwhelming verbosity often associated with AI-generated commands. We also provide sources for most of the AI agent recommendations, pointing directly to our documentation website. Our goal is to continuously refine this experience, ensuring that Docker’s AI Agent always provides the best possible command based on your specific local context.

Beside helping you run containers, the Docker AI Agent can today:

  • Explain, Rate and optimize Dockerfile leveraging the latest version of Docker.
  • Help you run containers in an effective, concise way, leveraging the local context (checking port already used or volumes).
  • Answers any docker related questions with the latest version of our documentations for our whole tool suite, and as such is able to answer any kind of questions on Docker tools and technologies.
  • Containerize a software project helping you run your software in containers.
  • Helps on Docker related Github Actions.
  • Suggest fix when a container is failing to start in Docker Desktop.
  • Provides contextual help for containers, images and volumes.
  • Can augment its answer with per directory MCP servers (see doc).
Blog DD Terminal new 1524x1140 1

For the node expert, in the above screenshot the AI is recommending node 20.12 which is not the latest version but the one the AI found in the package.json.

With every future version of Docker Desktop and thanks to the feedback that you provide, the agent will be able to do so much more in the future.

How can you try Docker AI Agent? 

This first beta release of Docker AI Agent is now progressively available for all signed-in users*. By default, the Docker AI agent is disabled. To enable it you will need to follow the steps below. Here’s how to get started:

  1. Install or update to the latest release of Docker Desktop 4.38
  2. Enable Docker AI into Docker Desktop Settings -> Features in Development
  3. For the best experience, ensure the Docker terminal is enabled by going to Settings → General
  4. Apply Changes 
blog DD Gordon Settings Dark

* If you’re a business subscriber, your Administrator needs to enable the Docker AI Agent for the organization first. This can be done through the Settings Management. If this is your case, feel free to contact us through the support  for further information.

Docker Agent’s Vision for 2025

By 2025, we aim to expand the agent’s capabilities with features like customizing your experience with more context from your registry, enhanced GitHub Copilot integrations, and deeper presence across the development tools you already use. With regular updates and your feedback, Docker AI Agent is being built to become an indispensable part of your development process.

For now this beta is the start of an exciting evolution in how we approach developer productivity. Stay tuned for more updates as we continue to shape a smarter, more streamlined way to build, secure, and ship applications. We want to hear from you, if you like or want more information you can contact us.

Learn more

Roboreactor – A Web-based platform to design Raspberry Pi or Jetson-based robots from electronics to code and 3D files

3 February 2025 at 19:36
Roboreactor Web based interface to design robots

Roboreactor is a web-based platform enabling engineers to build robotic and automation systems based on Raspberry Pi, NVIDIA Jetson, or other SBCs from a web browser including parts selection, code generation through visual programming, and generating URDF models from Onshape software. You can also create your robot with LLM if you wish. The first step is to create a project with your robot specifications and download and install the Genflow Mini image to your Raspberry Pi or NVIDIA Jetson SBC. Alternatively, you can install Gemini Mini middleware with a script on other SBCs, but we’re told the process takes up to 10 hours… At this point, you should be able to access data from sensors and other peripherals connected to your board, and you can also start working on the Python code using visual programming through the Roboreactor node generator without having to write code or understand low-level algorithms. Another [...]

The post Roboreactor – A Web-based platform to design Raspberry Pi or Jetson-based robots from electronics to code and 3D files appeared first on CNX Software - Embedded Systems News.

Phison’s aiDAPTIV+ AI solution leverages SSDs to expand GPU memory for LLM training

27 January 2025 at 00:01
Phison's aiDAPTIVCache family support 70B model

While looking for new and interesting products I found ADLINK’s DLAP Supreme series, a series of Edge AI devices built around the NVIDIA Jetson AGX Orin platform. But that was not the interesting part, what got my attention was it has support for something called the aiDAPTIV+ technology which made us curious. Upon looking we found that the aiDAPTIV+ AI solution is a hybrid (software and hardware) solution that uses readily available low-cost NAND flash storage to enhance the capabilities of GPUs to streamline and scale large-language model (LLM) training for small and medium-sized businesses. This design allows organizations to train their data models on standard, off-the-shelf hardware, overcoming limitations with more complex models like Llama-2 7B. The solution supports up to 70B model parameters with low latency and high-endurance storage (100 DWPD) using SLC NAND. It is designed to easily integrate with existing AI applications without requiring hardware changes, [...]

The post Phison’s aiDAPTIV+ AI solution leverages SSDs to expand GPU memory for LLM training appeared first on CNX Software - Embedded Systems News.

ESP32 Agent Dev Kit is an LLM-powered voice assistant built on the ESP32-S3 platform (Crowdfunding)

24 January 2025 at 00:01
ESP32 Agent Dev Kit

The ESP32 Agent Dev Kit is an ESP32-S3-powered voice assistant that offers integrations with popular LLM models such as ChatGPT, Gemini, and Claude. Wireless-Tag says the Dev Kit is suitable for “95% of AIoT applications, from smart home devices to desktop toys, robotics, and instruments” In some ways, it is similar to the SenseCAP Watcher, but it has a larger, non-touch display and dual mic input. It however does not support local language models. It also features a standard MikroBUS interface for expansion. For voice capabilities, the ESP32 Dev Kit integrates two onboard, noise-reducing microphones and a high-fidelity speaker. The built-in infrared laser proximity sensor detects human proximity and movement for “smart interactive experiences”. ESP32 Agent Dev Kit specifications: MCU – ESP32-S3 dual-core Tensilica LX7 microcontroller @ 240MHz, 8MB PSRAM Storage – 16MB flash Display – 3.5-inch Touchscreen, 480×360 resolution Camera – 5MP OmniVision OV5647 camera module, 120° field of [...]

The post ESP32 Agent Dev Kit is an LLM-powered voice assistant built on the ESP32-S3 platform (Crowdfunding) appeared first on CNX Software - Embedded Systems News.

SECO’s SMARC-QCS5430 SMARC SoM and devkit feature Qualcomm QCS5430 SoC for Edge AI and 5G applications

23 January 2025 at 17:45
SOM SMARC QCS5430 SoM

SECO has announced early engineering samples for its SOM-SMARC-QCS5430 system-on-module (SoM) and devkit designed to support IoT and edge computing applications. Built around the Qualcomm QCS5430 processor this SMARC-compliant SoM targets industrial automation, robotics, smart cities, and surveillance.

The module also offers dual MIPI-CSI interfaces for camera and connectivity options including USB 3.1, PCIe Gen3, dual GbE, and optional Wi-Fi and Bluetooth. SECO’s DEV-KIT-SMARC industrial devkit includes all the necessary components for rapid prototyping and integration.

The post SECO’s SMARC-QCS5430 SMARC SoM and devkit feature Qualcomm QCS5430 SoC for Edge AI and 5G applications appeared first on CNX Software - Embedded Systems News.

M5Stack LLM630 Compute Kit features Axera AX630C Edge AI SoC for on-device LLM and computer vision processing

21 January 2025 at 11:56
M5Stack LLM630 Compute Kit

M5Stack LLM630 Compute Kit is an Edge AI development platform powered by Axera Tech AX630C AI SoC with a 3.2 TOPS NPU designed to run computer vision (CV) and large language model (LLM) tasks at the edge, in other words, on the device itself without access to the cloud. The LLM630 Compute Kit is also equipped with 4GB LPDDR4 and 32GB eMMC flash and supports both wired and wireless connectivity thanks to a JL2101-N040C Gigabit Ethernet chip and an ESP32-C6 module for 2.4GHz WiFi 6 connectivity. You can also connect a display and a camera through MIPI DSI and CSI connectors. M5Stack LLM630 Compute Kit specifications: SoC – Axera Tech (Aixin in China) AX630C CPU – Dual-core Arm Cortex-A53 @ 1.2 GHz; 32KB I-Cache, 32KB D-Cache, 256KB L2 Cache NPU – 12.8 TOPS @ INT4 (max), 3.2 TOPS @ INT8 ISP – 4K @ 30fps Video – Encoding: 4K; Decoding:1080p [...]

The post M5Stack LLM630 Compute Kit features Axera AX630C Edge AI SoC for on-device LLM and computer vision processing appeared first on CNX Software - Embedded Systems News.

ARMOR: Egocentric Perception for Humanoid Robot Powered by XIAO ESP32S3

16 January 2025 at 01:31

Daehwa Kim (Carnegie Mellon University), Mario Srouji, Chen Chen, and Jian Zhang (Apple) have developed ARMOR, an innovative egocentric perception hardware and software system for humanoid robots. By combining Seeed Studio XIAO ESP32S3-based wearable depth sensor networks and transformer-based policies, ARMOR tackles the challenges of collision avoidance and motion planning in dense environments. This system enhances spatial awareness and enables nimble and safe motion planning, outperforming traditional perception setups. ARMOR was deployed on the GR1 humanoid robot from Fourier Intelligence, showcasing its real-world applications.

[Source: Daehwa Kim]

Hardwares Used

ARMOR uses the following hardware components:

    • XIAO ESP32S3 microcontrollers: Efficiently collect sensor data and stream it to the robot’s onboard computer via I2C.
    • Onboard Computer: NVIDIA Jetson Xavier NX processes sensor inputs. 
    • GPU (NVIDIA GeForce RTX 4090): Handles ARMOR-Policy’s inference-time optimization for motion planning.
    • SparkFun VL53L5CX Time-of-Flight (ToF) lidar sensors: Distributed across the robot’s body for comprehensive point cloud perception.

How the ARMOR Works

The hardware solution of ARMOR’s egocentric perception system uses distributed ToF lidar sensor networks. Groups of four ToF sensors are connected to Seeed Studio XIAO ESP32S3 microcontrollers, capturing high-precision depth information from the environment. The XIAO ESP32S3 serves as a crucial intermediary controller, efficiently managing real-time sensor data transmission. It streams the collected depth data via USB to the robot’s onboard computer, the NVIDIA Jetson Xavier NX, which then wirelessly transmits the data to a powerful Linux machine equipped with an NVIDIA GeForce RTX 4090 GPU for data processing. This sophisticated data pipeline enables the creation of an occlusion-free point cloud around the humanoid robot, providing essential environmental awareness data for the ARMOR neural motion planning algorithm. The distributed and light-weight hardware setup also ensures enhanced spatial awareness and overcomes the limitations of head-mounted or external cameras, which often fail in cluttered or occluded environments.

Daehwa Kim, one of the core developers of this project, mentions why they selected the Seeed Studio XIAO for this project.

“We might imagine a future where users easily plug and play with wearable sensors for humanoids and augment robots' perceptions in various tasks. XIAO ESP32 series makes the wearable sensor system easily modularizable. We specifically adopted the XIAO ESP32S3 in ARMOR because of its powerful computing and tiny form factor.”
Armor Policy - Transformer-based policy [Source: Daehwa Kim]

The neural motion planning system, ARMOR-Policy, is built on a transformer-based architecture called the Action Chunking Transformer. This policy was trained on 86 hours of human motion data from the AMASS dataset using imitation learning. ARMOR-Policy processes the robot’s current state, goal positions, and sensor inputs to predict safe and efficient trajectories in real-time. The system leverages latent variables to explore multiple trajectory solutions during inference, ensuring flexibility and robustness.

Trained on 86 hours of human motion dataset [Source: Daehwa Kim]

ARMOR was rigorously tested in both simulated and real-world scenarios. It demonstrated remarkable improvements in performance, reducing collisions by 63.7% and increasing success rates by 78.7% compared to exocentric systems with dense head-mounted cameras. Additionally, the transformer-based ARMOR-Policy reduced computational latency by 26× compared to sampling-based motion planners like cuRobo, enabling efficient and nimble collision avoidance.

Real World Hardware Deployment [Source: Daehwa Kim]

Discover more about ARMOR

Want to explore ARMOR’s capabilities? The research team will soon release the source code, hardware details, and 3D CAD files on their GitHub repository. Dive deeper into this cutting-edge project by reading their paper on arXiv. Stay tuned for updates to replicate and innovate on this revolutionary approach to humanoid robot motion planning! To see ARMOR in action, check out their demonstration video on YouTube.

End Note

Hey community, we’re curating a monthly newsletter centering around the beloved Seeed Studio XIAO. If you want to stay up-to-date with:

🤖 Cool Projects from the Community to get inspiration and tutorials
📰 Product Updates: firmware update, new product spoiler
📖 Wiki Updates: new wikis + wiki contribution
📣 News: events, contests, and other community stuff

Please click the image below👇 to subscribe now!

The post ARMOR: Egocentric Perception for Humanoid Robot Powered by XIAO ESP32S3 appeared first on Latest Open Tech From Seeed.

💾

Python Launchpad 2025: Your Blueprint to Mastery and Beyond

15 January 2025 at 15:04

Introduction

Python is everywhere, from data science to web development. It’s beginner-friendly and versatile, making it one of the most sought-after skills for 2025 and beyond. This article outlines a practical, step-by-step roadmap to master Python and grow your career.

Learning Time Frame

The time it takes to learn Python depends on your goals and prior experience. Here’s a rough timeline:

  • 1-3 Months: Grasp the basics, like syntax, loops, and functions. Start small projects.
  • 4-12 Months: Move to intermediate topics like object-oriented programming and essential libraries. Build practical projects.
  • Beyond 1 Year: Specialize in areas like web development, data science, or machine learning.

Consistency matters more than speed. With regular practice, you can achieve meaningful progress in a few months.

Steps for Learning Python Successfully

  1. Understand Your Motivation
    • Define your goals. Whether for a career change, personal projects, or academic growth, knowing your “why” keeps you focused.
  2. Start with the Basics
    • Learn Python syntax, data types, loops, and conditional statements. This foundation is key for tackling more complex topics.
  3. Master Intermediate Concepts
    • Explore topics like object-oriented programming, file handling, I/O operations and libraries such as pandas and NumPy.
  4. Learn by Doing
    • Apply your skills through coding exercises and small projects. Real practice strengthens understanding.
  5. Build a Portfolio
    • Showcase your skills with projects like web apps, or basic data analysis dashboard. A portfolio boosts job prospects.
  6. Challenge Yourself Regularly
    • Stay updated with Python advancements and take on progressively harder tasks to improve continuously.

4. Python Learning Plan

Month 1-3

  • Focus on basics: syntax, data types, loops, and functions.
  • Start using libraries like pandas and NumPy for data manipulation.

Month 4-6

  • Dive into intermediate topics: object-oriented programming, file handling, and data visualization with matplotlib.
  • Experiment with APIs using the FastAPIy and Postman

Month 7 and Beyond

  • Specialize based on your goals:
    • Web Development: Learn Flask or Django for backend
    • Data Science: Explore TensorFlow, Scikit-learn, and Kaggle
    • Automation: Work with tools like Selenium for Web Scraping

This timeline is flexible—adapt it to your pace and priorities.

5. Top Tips for Effective Learning

  1. Choose Your Focus
    • Decide what interests you most—web development, data science, or automation. A clear focus helps you navigate the vast world of Python.
  2. Practice Regularly
    • Dedicate time daily or weekly to coding. Even short, consistent practice sessions with platforms like HackerRank will build your skills over time
  3. Work on Real Projects
    • Apply your learning to practical problems. Train a ML model, automate a task, or analyze a dataset. Projects reinforce knowledge and make learning fun.
  4. Join a Community
    • Engage with Python communities online or locally. Networking with others can help you learn faster and stay motivated.
  5. Take Your Time
    • Don’t rush through concepts. Understanding the basics thoroughly is essential before moving to advanced topics.
  6. Revisit and Improve
    • Go back to your old projects and refine them. Optimization teaches you new skills and helps you see your progress.

Best Ways to Learn Python in 2025

1. Online Courses

Platforms like Youtube, Coursera and Udemy offer structured courses for all levels, from beginners to advanced learners.

2. Tutorials

Hands-on tutorials from sites like Real Python and Python.org are great for practical, incremental learning.

3. Cheat Sheets

Keep cheat sheets for quick references to libraries like pandas, NumPy, and Matplotlib. These are invaluable when coding.

4. Projects

Start with simple projects like to-do lists apps. Gradually, take on more complex projects such as web apps or machine learning models.

5. Books

For beginners, Automate the Boring Stuff with Python by Al Sweigart simplifies learning. Advanced learners can explore Fluent Python by Luciano Ramalho.

To understand how Python is shaping careers in tech, read The Rise of Python and Its Impact on Careers in 2025.

Conclusion

Python is more than just a programming language; it’s a gateway to countless opportunities in tech. With a solid plan, consistent practice, and real-world projects, anyone can master it. Whether you’re a beginner or looking to advance your skills, Python offers something for everyone.

If you’re ready to fast-track your learning, consider enrolling in OpenCV University’s 3-Hour Python Bootcamp, designed for beginners to get started quickly and efficiently.

Start your Python journey today—your future self will thank you!

The post Python Launchpad 2025: Your Blueprint to Mastery and Beyond appeared first on OpenCV.

New Product Post – Vision AI Sensor, Supporting LoRaWAN and RS485

By: violet
15 January 2025 at 10:55

Halloo Everyone, welcome to this blog!

The main topic is NEW PRODUCTS! And the keywords are LoRaWAN and Vision AI. You would wonder, “your devices has a camera and it is LoRaWAN?” Yes indeed! I know, I know! You would tell me “You can’t send images or videos via LoRaWAN because the ‘message’ is too large”, or someone else would say “Technically you can, but you need to ‘slice’ the large images or videos to send them ‘bit by bit’ and ‘piece them together’ aferward. Then why bother?”

What if I tell you, you don’t have to send the images or videos? Think about it, the goal is to get the results captured by the camera, not tons of irrelevant footage. In this case, as long as you can get the results, there is no need to send large images or videos.

Without further ado, here comes the intro of our main cast of this post – SenseCAP A1102, an IP66-rated LoRaWAN® Vision AI Sensor, ideal for low-power, long-range TinyML Edge AI applications. It comes with 3 pre-deployed models (human detection, people counting, and meter reading) by default. Meanwhile, with SenseCraft AI platform [Note 1], you can use the pre-trained models or train your customized models conveniently within a few clicks. Of course, SenseCAP A1102 also supports TensorFlow Lite and PyTorch.

This Device consists of two main parts: the AI Camera and the LoRaWAN Data Logger. While different technologies are integrated in this nifty device, I would like to highlight 4 key aspects:

  • Advanced AI Processor

As a vision AI Sensor, SenseCAP A1102 proudly adopted the advanced Himax WiseEye2 HX6538 processor featuring a dual-core Arm Cortex-M55 and integrated Arm Ethos-U55. (The same AI processor as in Grove Vision AI v2 Kit.) This ensures a high performance in vision processing.

Meanwhile, please keep in mind that lighting and distance will affect the performance, which is common for applications that involve cameras. According to our testing, SenseCAP A1102 can achieve 70% confidence for results within 1 ~ 5 meters in normal lighting.

  • Low-Power Consumption, Long-Range Communication

SenseCAP A1102 is built with Wio E5 Module featuring STM32WLE5JC, ARM Cortex M4 MCU, and Semtech SX126X. This ensures low-power consumption and long-range communication, as in SenseCAP S210X LoRaWAN Environmental sensors. Supporting a wide range of 863MHz – 928MHz frequency, you can order the same device for different stages of your projects in multiple continents, saving time, manpower, and costs in testing, inventory management, and shipment, etc.

SenseCAP A1102 opens up new possibilities to perceive the world. The same hardware with different AI models, you have different sensors for detecting “objects” (fruits, poses, and animals) or reading meters (in scale or digits), and many more. With the IP66 rating (waterproof and dustproof), it can endure long-term deployment in outdoor severe environments.

We understand interoperability is important. As a standard LoRaWAN device, SenseCAP A1102 can be used with any standard LoRaWAN gateways. When choosing SenseCAP Outdoor Gateway or SenseCAP M2 Indoor Gateway, it is easier for configuration and provisioning.

  • Easy Set-up and Configure with SenseCraft App

We provide SenseCraft App and SenseCAP Web Portal for you to set up, configure, and manage the devices & data more easily.

At SenseCraft App, you can change settings with simple clicks such as choose the platform, change the frequency for the specific region, change data upload intervals (5 ~ 1440min), or packet policy (2C+1N, 1C, 1N) and other settings. 

You can see the live and historical data of your devices on both SenseCraft App and SenseCAP Web Portal easily.

When using SenseCAP Sensors and SenseCAP gateways, you can also choose to use SenseCAP cloud platform, which is free for 6 months for each devices and then it is 0.99usd/device/month, or you can choose to use your own platform or other 3rd-party platforms. We offer API supporting MQTT and HTTP.

  • Wi-Fi Connectivity for Transmitting Key Frames

Inside the AI Camera part of this device sits a tiny-yet-powerful XIAO ESP32C3, which is powered by new RISC-V architecture. This adds Wi-Fi connectivity to SenseCAP A1102. In the applications, you can get the reference results via LoRaWAN and at the same time get the key frames via Wi-Fi to validate or further analyze.

While we always demonstrate how much we love LoRaWAN for its ultra-low-power consumption and ultra-long-range communication by continuously adding more and more products in the portfolio of our LoRaWAN products family. We also understand some people might prefer other communication protocols. Rest assured, we have options for you. As mentioned above, we value interoperability a lot. Here comes to the intro of another important cast of this post: RS485 Vision AI Camera!

RS485 Vision AI Camera is a robust vision AI sensor that supports MODBUS-RS485 [Note 2] protocol and Wi-Fi Connectivity. Simply put, it is the camera part of the SenseCAP A1102, which adopts the Himax AI Processor for AI performance. Its IP66 rating makes it suitable for both indoor and outdoor applications.

You can use RS485 Vision AI Camera with SenseCAP Sensor Hub 4G Data Logger to transmit the reference results via 4G. If your existing devices or systems support MODBUS-RS485, you can connect it with this RS485 Vision AI Camera for your applications.

In this post, we introduced SenseCAP A1102 and RS485 Vision AI Camera. I hope you like them and will get your hands on these devices for your projects soon! I already envision this device used in different applications from smart home, office, and building managetment to smart agriculture, biodiversity conservation, and many others. And we look forward to seeing your applications!

If you’ve been using Seeed products or following our updates, you might have noticed that the expertise of Seeed products and services is in smart sensing and edge computing. While we’ve developed a rich collection of products in (1) sensor networks that collect different real-world data and transmit via different communication protocols, and (2) edge computing that brings computing power and AI capabilities to the edge. I think we can say that Seeed is strong in smart sensing, communication, and edge computing. 

There are many more new products on the Seeed roadmap for us to get a deeper perception of the world with AI-powered insight and actions. Stay tuned!

Last but not the least, we understand that you might have requirements in product features, functionalities, or form factors for your specific applications, we offer a wide range of customization services based on the existing standard products. Please do not hesitate to reach out to us, to share your experience, your thoughts about new products, wishes for new features, or ideas for cooperation possibilities! Reach us at iot[at]seeed[dot]cc! Thank you!

[Note 1: If you do not know it yet, SenseCraft AI is a web-base platform for AI applications. No-code. Beginner-friendly. I joined the livestream on YouTube with dearest Meilily to introduce this platform last week. If you are interested, check the recording here.]

[Note 2: MODBUS RS485 is a widely-used protocol for many industrial applications. You can learn more here about MODBUS and RS485, and you can explore the full range of all RS485 products here.]

The post New Product Post – Vision AI Sensor, Supporting LoRaWAN and RS485 appeared first on Latest Open Tech From Seeed.

Announcing the OpenCV Perception Challenge for Bin-Picking

11 January 2025 at 02:35

We’re excited to announce a different type of competition than we’ve done in the past: The Perception Challenge For Bin-Picking, sponsored by Intrinsic, a new computer vision themed competition designed to tackle real-world robotics problems. This is your opportunity to compete and showcase your skills to a global audience, all while competing for a share of $60,000 in prizes! Read on to find out the rules and deadlines.

What is bin-picking? Bin picking is a robotics task that involves a camera and robotic arm or manipulator selecting individual objects from a container (often referred to as a bin) and moving them to a specific location or orientation. It is widely used in industries such as manufacturing, logistics, and warehousing to automate processes like sorting, assembly, and packaging.

Why Join?

This challenge is about putting your computer vision, AI, and robotics expertise to the test in real-world scenarios. Unlike previous OpenCV competitions, the goal here is well-defined, with your task being to create the most accurate results. Using Intrinsic’s open-source datasets and a remotely operated robotic workcell powered by the Intrinsic platform, you’ll develop cutting-edge solutions for challenging 6DoF pose estimation.

Unlike traditional competitions, we’re evaluating your algorithms being used live with a robot-in-the-loop, offering a more realistic and industrial type assessment of their success.

  • Don’t worry if you aren’t familiar with this type of vision task. We’ll explain more and provide example code in advance to people who register for the competition of the February 1st official opening for submissions. Don’t miss out: register today!

The Prizes

  • 1st (Grand Prize Winner): $18,000
  • 2nd (Grand Prize Runner-Up): $12,000
  • 3rd Place: $8,000
  • 4th Place: $4,000
  • 5th Place: $3,000
  • 6th to 10th Place: $1,000
  • Special Prize for Best One-shot Solution: $10,000

Key Dates

All deadlines are 11:59 pm (Pacific Time)

  • Today — Competition Registration Begins
  • February 1, 2025 — Competition Begins
  • May 1, 2025 — Top 5 Teams Announced (Elite Phase Begins)
  • May 22, 2025 — Live Bin-Picking Challenge on OpenCV Webinar
  • June 10, 2025 — Final Winners Awarded

How It Works

Participants will submit their solutions as Docker images for evaluation in test environments. Leaderboards will update in real-time via EvalAI, tracking accuracy and efficiency metrics. 

This challenge is open to teams of up to 10 members. Please note: participation is restricted to countries not subject to sanctions from the United States of America.

Watch The Announcement Webinar

Be Part of the Action

Looking for teammates to join the OpenCV Perception Challenge For Bin-Picking (sponsored by Intrinsic)? Connect with fellow participants on the official OpenCV Slack channel and OpenCV forum.

Whether you’re a seasoned professional or a new student in robotics and AI, this is your chance to make a mark. Join the competition, engage with the OpenCV community, and bring your ideas to life.

Visit bpc.opencv.org to learn more and register your team. The challenge runs in partnership with the BOP benchmark and we encourage participants to submit their results to the associated BOP-Robotics challenge, which will be announced soon.

The post Announcing the OpenCV Perception Challenge for Bin-Picking appeared first on OpenCV.

Pharmaceutical Production with AI-Powered Process Monitoring

31 December 2024 at 14:55

Hardware: reServer Industrial J4012, powered by NVIDIA Jetson Orin NX 16GB

Use Case Provider: NeuroSYS

Application: Abnormal Detection in Production Line

Industrial: Pharmaceutical

Deployment Location: Norway

A global pharmaceutical company sought to enhance the efficiency and accuracy of its production process by integrating advanced AI technologies. Their challenge was to detect anomalies, such as tipped vials on conveyor belts, without disrupting existing workflows or requiring extensive reconfiguration of equipment. That’s how the whole solution got started, combining Seeed reServer Industrial J4012 edge device with NeuroSYS AI software platform together to deliver an innovative solution, transforming their production monitoring with an AI-enabled camera system.

Background

Pharmaceutical production lines often handle vials of varying materials (plastic and glass) and capacities (7 to 100 ml), totaling seven distinct types. Traditionally, the process relied on manual tuning of sensors and equipment reconfiguration for each vial type. These limitations resulted in inefficiencies and downtime, especially when tipped vials reached the filling machine.

The company required a versatile solution to address this challenge—one capable of real-time anomaly detection and performance monitoring while minimizing physical intervention.

Initial Challenge

The key challenges included:

  • Detecting tipped vials on a moving conveyor belt.
  • Minimizing false alarms caused by obstructions or unanticipated scenarios.
  • Ensuring the system can adapt to various vial types without retooling machinery.
  • Developing a scalable, non-intrusive solution that could integrate seamlessly with existing workflows.

Solution

The whole system implemented an advanced vision AI pipeline leveraging a combination of industrial hardware and AI-driven software.

Components

  1. Hardware: An industrial-grade camera integrated with the reServer industrial J4012 edge computing unit, powered by NVIDIA Jetson Orin NX 16GB.
  2. Software: Machine learning models trained on production data to detect anomalies and gather insights.
  3. Dashboard: Custom visualizations for real-time monitoring and historical analysis.

Implementation Process

  1. Optimized Setup:
    • The camera and lens were calibrated to capture high-quality images of vials as they moved along the conveyor belt.
    • During the Proof of Concept (PoC) stage, the system operated behind a plastic curtain to validate functionality without disrupting production.
  2. Real-Time Analysis:
    • Frames captured by the camera were processed on the reServer Jetson device in real-time using convolutional neural networks (CNNs).
    • The system determined vial positions (standing or tipped) and triggered alerts for anomalies.
  3. Data Processing and Visualization:
    • Data was stored in a local database and visualized on dashboards, providing insights into machine performance.
    • A vial counting module tracked both standing and tipped vials for statistical analysis.
  4. Enhanced Alert Mechanisms:
    • Detection of a tipped vial activated a signal tower with visual (light) and auditory (buzzer) alerts.

Challenges Encountered During Implementation

During implementation, the system faced challenges to meet the specific deployment scenarios. for example, there could be some false alarms happening which are caused by operator hands entering the camera’s field of view, it’s finally optimized by retraining the machine learning model with additional obstructed images which includes obstructions. Sometimes the vials getting stuck or tipped at curved conveyor segments could be misclassified, and the dataset gaps that overlooked scenarios involving occlusions and bends. These issues were all addressed by enriching the dataset to improve robustness, ensuring accurate performance in diverse conditions.

Results and Achievements

The system delivered remarkable results, transforming the client’s production process:

  1. Anomaly Detection: Achieved 99.89% accuracy in detecting tipped vials, regardless of material or capacity.
  2. Downtime Monitoring: Enabled precise downtime tracking, counting delays after 10 seconds of no vial movement.
  3. Statistical Insights:
    • Counted tipped vials for quality monitoring and standing vials to assess machine efficiency.
    • Provided metrics for machine cycle optimization.
  4. Scalability:
    • The same hardware setup was enhanced with new functionalities, requiring only software updates.
    • Features like snapshot saving of anomalies allowed for deeper analysis of system performance and false positives.

Additional Benefits

  • Flexibility: Eliminated the need for retooling and physical modifications by enabling remote software updates.
  • Future-Proofing: New scenarios and events can be incorporated into the model as they occur, ensuring continuous improvement.
  • Scalable Solution: With minimal hardware adjustments, the system can evolve to handle additional tasks or integrate with advanced analytics platforms.

This project showcases the transformative potential of AI in industrial automation, setting the stage for smarter and more efficient manufacturing processes. The reServer Jetson edge device not only addressed the pharmaceutical company’s immediate challenges but also provided a robust platform for continuous innovation. By leveraging the Jetson edge vision AI technology, the client gained real-time anomaly detection, improved machine efficiency, and actionable insights into their production process—all while reducing manual intervention and hardware dependency.

By implementing this solution, the client not only resolved their immediate challenges but also gained a robust and scalable system for ongoing innovation and operational excellence.


Seeed NVIDIA Jetson Ecosystem

Seeed is an Elite partner for edge AI in the NVIDIA Partner Network. Explore more carrier boards, full system devices, customization services, use cases, and developer tools on Seeed’s NVIDIA Jetson ecosystem page.

Join the forefront of AI innovation with us! Harness the power of cutting-edge hardware and technology to revolutionize the deployment of machine learning in the real world across industries. Be a part of our mission to provide developers and enterprises with the best ML solutions available. Check out our successful case study catalog to discover more edge AI possibilities!

Discover infinite computer vision application possibilities through our vision AI resource hub!

Take the first step and send us an email at edgeai@seeed.cc to become a part of this exciting journey! 

Download our latest Jetson Catalog to find one option that suits you well. If you can’t find the off-the-shelf Jetson hardware solution for your needs, please check out our customization services, and submit a new product inquiry to us at odm@seeed.cc for evaluation.

The post Pharmaceutical Production with AI-Powered Process Monitoring appeared first on Latest Open Tech From Seeed.

ESP32-AIVoice-Z01 is an ESP32-S3 AI voice kit with dual microphones, wake word detection, noise reduction and echo cancellation

30 December 2024 at 14:06
ESP32 AIVoice Z01 Development Kit

The ESP32-AIVoice-Z01 is an affordable ESP32-S3-powered AI voice kit designed for creating voice-controlled AI applications. It features Wi-Fi and Bluetooth connectivity through the ESP32-S3 SoC, a dual digital microphone array for accurate voice recognition, and an onboard amplifier. The system also implements audio algorithms for noise reduction and echo cancellation. The ESP32-AIVoice-Z01 board supports Espressif’s WakeNet voice wake-up framework and integrates with the AiLinker open-source backend service framework to enable the connection to various large AI model services like OpenAI, ZhiPu QingYan, TongYi QianWen, and DouBao. These features make this device suitable for developing AI-powered toys, IoT devices, mobile devices, and smart home applications. ESP32-AIVoice-Z01 ESP32 AI voice kit specifications Wireless module – ESP32-S3-WROOM-1U SoC – Espressif Systems ESP32-S3 dual-core Xtensa LX7 processor Memory – 8MB PSRAM Storage – 16MB flash Wireless – WiFi 4 and Bluetooth 5.0 connectivity with external antenna Storage – MicroSD card slot Audio Dual digital microphone array [...]

The post ESP32-AIVoice-Z01 is an ESP32-S3 AI voice kit with dual microphones, wake word detection, noise reduction and echo cancellation appeared first on CNX Software - Embedded Systems News.

❌
❌