Normal view

There are new articles available, click to refresh the page.

Today — 21 November 2024Arm

Arm
Igniting a New Era of Cloud Computing for AI 20 November 2024 at 23:54

Igniting a New Era of Cloud Computing for AI

Arm

By: Dermot O'Driscoll

20 November 2024 at 23:54

We’re living in a generation of compute that is being defined by AI – a transformation that is happening at a pace unlike anything we’ve seen before. Arm remains on the critical path to enabling this AI-accelerated future in a sustainable and scalable way, providing new engineering innovation and developments to make it happen. It’s clear to me that this vision is shared across our ecosystem, including at this week’s Microsoft Ignite event.

Across the many AI advancements announced by Microsoft, it’s evident they are on the path to building a sustainable, scalable, and secure platform for AI and that they’re dedicated to changing the way developers build, deploy, and scale their applications in the cloud. Arm’s collaboration with Microsoft on Azure Cobalt 100 has already shifted the landscape of cloud data centers and the services offered by Microsoft in just one year since its launch in 2023. By leveraging the flexibility and power-efficiency of Arm Neoverse Compute Subsystems (CSS), Microsoft is pushing the boundaries of compute with Cobalt 100, establishing a capable and flexible infrastructure supporting a wide variety of mission critical modern applications — from media servers and open-source databases to CI/CD pipelines.

AI has not only opened the world’s eyes to the power challenge in the datacenter, but it has unlocked a greater emphasis on the need for more specialized silicon. Every watt counts, and for change-makers like Microsoft, this means taking greater control over the entire infrastructure stack from silicon to cloud service deployment with sustainability in focus.

As mentioned in the Microsoft keynote, 100% of Microsoft Teams’ media processing capabilities now run on Cobalt 100, which is a testament to purpose-built compute delivering the required performance as efficiently as possible. This is the mission that Neoverse CSS was built for. Through tailored solutions like Cobalt 100, Microsoft is setting the stage for a future-ready cloud, capable of handling the growing demands of AI-enabled workloads without pushing energy consumption to unsustainable levels. To dig in on the impressive performance gains delivered by Cobalt 100-powered VMs to date, I encourage you to check out this week’s Arm Viewpoints podcast with Arpita Chatterjee, Senior Product Manager for Azure Platforms. And if you happen to tune into the Microsoft Ignite digital event, check out Arm’s virtual booth.

In addition to the impressive Cobalt 100 momentum to date, Microsoft announced they will be the first cloud vendor to make instances based on Nvidia’s Grace Blackwell platform available. Consisting of 72 Arm Neoverse V2 cores connected through a high-bandwidth coherent link to Nvidia’s latest Blackwell accelerator, Grace Blackwell is a great example of the kind of specialized silicon the Arm platform enables our partners to build, in this case targeting the most demanding AI training and inference workloads.

The groundwork for an AI-powered future

Arm’s longstanding partnership with Microsoft has been instrumental in our mission to enable a modern AI-enabled data center with specialized silicon, but silicon is not the limit of our work together. We’re partnering to make it as easy as possible for developers to transition their workloads to optimized, Arm-based platforms. With tools like the Arm Software Ecosystem Dashboard and a robust library of Azure-specific tutorials and resources, developers are getting access to a comprehensive view of software packages supported on Arm and hands-on instructions to seamlessly migrate and run their applications on Arm-based Microsoft Azure instances. One example I’m particularly excited about is the new Arm extensions for GitHub Copilot which will offer specialized tools for AI and standard code development, such as code migration, containerization, CI/CD workflows, and performance optimization. We’ll be releasing it in the Github marketplace this year, so watch this space for more updates on availability!

Cobalt 100 is only one example of a movement toward Arm-based purpose-built computing solutions that is happening across the broader data center landscape. The Arm architecture is becoming the foundation for specialized silicon needed to achieve the performance and efficiency required to succeed in the AI era. Alongside decades of investment in a robust software ecosystem to help developers bring their AI innovations to life, this is the groundwork for an AI-powered future that brings innovative advances in sciences, commerce, productivity and more.

The post Igniting a New Era of Cloud Computing for AI appeared first on Arm Newsroom.

Before yesterdayArm

Arm
Building the Future of AI on Arm at AI Expo Africa 2024 19 November 2024 at 21:00

Building the Future of AI on Arm at AI Expo Africa 2024

Arm

By: Arm Editorial Team

19 November 2024 at 21:00

At AI Expo Africa 2024, Arm brought together AI developers, enthusiasts, and industry leaders through immersive workshops, insightful talks, exclusive networking opportunities, and an engaging booth experience. The event is Africa’s largest AI conference and trade show, with over 2,000 delegates from all over the African continent.

Arm has been attending AI Expo Africa for the past three years, and this year we noted a significant uptick in AI applications running on Arm and a definite thirst for knowledge in how to best to deploy and accelerate AI on Arm. Held at the Sandton Convention Centre in Johannesburg, South Africa, Arm’s presence at the event left a strong impact on the AI developer ecosystem, fostering connections and sparking innovation, with a range of expert insights from Arm tech leaders and Ambassadors from the Arm Developer Program.

Arm Ambassadors are a group of experts and community leaders developing on Arm who support and help lead the Developer Program through a host of Arm-endorsed activities like the various talks, workshops and engagements at AI Expo Africa. At the event, there were Arm Ambassadors from Ghana, Kenya, Switzerland and, of course, South Africa in attendance.

Day 1: Workshops and live demos

Arm kicked off with a high-energy workshop that saw an incredible turnout. Shola Akinrolie, Senior Manager for the Arm Developer Program, opened the session with a keynote introduction, setting the stage for a deep dive into Arm’s AI technology and its community-driven initiatives.

Distinguished Arm Ambassador Peter Ing then took the spotlight, showing how to run AI models at the edge on the Arm Compute Platform. He demonstrated the Llama 3.2 1B model running on a Samsung mobile device, showcasing real-time AI inference capabilities and illustrating how Arm is creating new opportunities for running small language models on the edge. The live demo left the audience captivated by the performance and efficiency of the Arm Compute Platform.

Arm’s keynote introduction at the AI Expo Africa 2024

Another standout session was led by Distinguished Arm Ambassador Dominica Abena Oforiwaa Amanfo, who shared her expertise on the Grove Vision AI V2 microcontroller (MCU), which is powered by a dual-core Arm Cortex-M55 CPU and Ethos-U55 NPU NN unit. Dominica highlighted the TinyML’s capabilities, as well as its compatibility with PyTorch and ExecuTorch. This showcased the reach and versatility of low-power, high impact AI innovations that are powered by Arm.

Developer session led by Distinguished Arm Ambassador Dominica Abena Oforiwaa Amanfo

The Arm booth: A hub of innovation

At AI Expo Africa, the Arm booth was bustling with energy, drawing hundreds of developers eager to experience Arm’s technology first-hand. The team engaged with visitors in discussions and hands-on demos. The booth was packed with excitement, from insightful tech exchanges to exclusive SWAG giveaways, including a highly sought-after Raspberry Pi MCU!

To end the day, Arm hosted an exclusive Arm Developer Networking Dinner. The evening was filled with lively discussions led by Arm’s Director of Software Technologies Rod Crawford and Arm Developer Program Ambassadors, as they shared their insights on AI’s future and the impact of edge computing across various industries.

Day 2: Inspiring talks and networking

On day two of the event, Arm’s Rod Crawford, captivated the audience with a powerful talk on “Empowering AI from Cloud to Edge.” Rod shared how Arm supports developers in harnessing the full potential of AI, from efficient cloud computing to high-performance, edge-based AI solutions. This means developers can create more powerful applications that work better and faster.

The tallk demonstrated how both generative AI and classic AI workloads could run across the entire spectrum of computing on Arm, from powerful cloud services to mobile and IoT devices. Through Arm Kleidi, Arm is engaging with leading AI frameworks, like MediaPipe, ExecuTorch and PyTorch, to ensure developers can seamlessly take advantage of AI acceleration on Arm CPUs without any changes to their code. Rod’s insights were met with enthusiasm as developers learned how Arm’s technologies accelerate AI deployment, even for the most demanding applications.

The final day wrapped up with a high-spirited “Innovation Coffee” session, offering attendees a relaxed environment to connect and reflect on Arm’s advancements. Stay tuned for highlights of this session on the Arm Software Developers YouTube channel.

A heartfelt thanks

Arm extends its deepest gratitude to everyone who contributed to and joined us at AI Expo Africa. Special thanks to the Arm team—Rod Crawford, Gemma Platt, and Stephen Ozoigbo—as well as the incredible Arm Developer Program Ambassadors Peter Ing, Dominica Amanfo, Derrick Sosoo, Brenda Mboya, and Tshega Mampshika for their hard work and passion. We also appreciate Marvin Rotermund, Nomalungelo Maphanga, Stephania Obaa Yaa Bempomaa, and Mia Muylaert for their energy and support at the booth.

Here are what some of the Arm Developer Program Ambassadors had to say about the event:

Brenda Mboya: “One of my favorite moments at the event was seeing the lightbulb go off for attendees who visited the Arm booth and realized how integral Arm has been in their lives. It was an honor to engage with young people interested in utilizing Arm-based technology in their school initiatives and I am glad that I was able to direct them to sign-up to be part of the Arm Developer Program.”

Derrick Sosoo: “Arm’s presence at AI Expo Africa 2024 marked a significant shift towards building strong connections with developers through immersive experiences. Our engaging workshops, insightful talks, Arm Developer meetup, and interactive booth showcase left an indelible mark on attendees.”

Dominica Amanfo: “We witnessed overwhelming interest from visitors eager to learn about AI on Arm and our Developer Program. I’m particularly grateful for the opportunity to collaborate with fellow Arm Ambassadors alongside our dedicated support team at the booth, which included students from the DUT Arm (E³)NGAGE Student Club.”

The future of AI is built on Arm

By uniting innovators, developers, and enthusiasts, Arm is leading the charge in shaping the future of AI. Together, we’re building a community that will drive the future of AI on Arm, empowering developers worldwide to innovate and bring cutting-edge technology to life.

Learn more about Arm’s developer initiatives and join the journey at Arm Developer Program.

The post Building the Future of AI on Arm at AI Expo Africa 2024 appeared first on Arm Newsroom.

Arm
How is AI Being Used in Cars? 18 November 2024 at 21:00

How is AI Being Used in Cars?

Arm

By: Arm Editorial Team

18 November 2024 at 21:00

Artificial intelligence (AI) in the automotive industry is no longer a future-looking buzzword. From smart navigation that learns from every journey to intelligent interactions between the driver and car, AI has been consistently revolutionizing the driving experience.

Moreover, AI is helping to save lives. It’s making roads safer with predictive safety features and driver assistance systems that feel like having a co-pilot with superhuman reflexes. But, contrary to popular belief, AI is not a recent phenomenon in the automotive sector and has been integrated into automotive applications for over two decades.

As Masashige Mizuyama, Representative Director, Vice President and CTO at Panasonic Automotive Systems, highlights in the recent Arm Viewpoints podcast: “AI has been integrated into automotive applications for over 20 years, evolving from simple voice commands to advanced deep learning models that understand natural language.”

This evolution goes beyond just adding new features; it’s about fundamentally transforming the driving experience. Advanced driver assistance systems (ADAS), Human Machine Interface (HMI), and in-vehicle infotainment (IVI) are prime examples of how AI enhances vehicle safety and user interaction. Moreover, the fusion of sensor data using AI improves safety and provides meaningful insights to both drivers and passengers.

AI in ADAS

One of the most prominent applications of AI in cars is ADAS. These systems enhance vehicle safety by providing real-time data processing and decision-making capabilities. According to a report by the Partnership for Analytics Research in Traffic Safety (PARTS), by 2023 five ADAS features – forward collision warning, automatic emergency braking, pedestrian detection warning, pedestrian automatic emergency braking (AEB), and lane departure warning—achieved market penetration rates higher than 90% in new vehicles.

AI in HMI

Another significant advancement is the HMI. AI-powered voice recognition systems allow drivers to keep their eyes on the road and hands on the wheel while interacting with their vehicles. This technology is rapidly evolving, making in-car interactions more seamless and enhancing overall driving safety.

AI enables cars to perceive and infer the intentions of drivers and passengers, allowing for smarter and more autonomous responses. For instance, if a driver expresses a desire for coffee, the AI can recommend a nearby coffee shop, set the navigation route, and even place an order—all while minimizing distractions.

Moreover, AI’s ability to process vast amounts of data from various sensors is essential for ensuring safety and enhancing the overall driving experience. By leveraging AI, vehicles can provide a more comfortable and creative environment, allowing occupants to make the most of their time on the road.

AI-powered voice recognition systems, for example, allow drivers to interact with their vehicles without taking their hands off the wheel or their eyes off the road. This technology is rapidly evolving, making in-car interactions more seamless and intuitive. In fact, according to GlobalData’s report, in the past three years, the automotive industry has seen over 720,000 patents filed and approved. This widespread use highlights the increasing reliance on voice technology for in-car interactions.

AI-based IVI systems

AI-powered IVI systems are set to transform the driving experience by integrating multiple advanced technologies that continuously adapt to the habits of drivers. Voice recognition will be used for seamless, hands-free interaction, enhancing safety and convenience. Natural language processing will enable intuitive communication, making interactions feel more human-like.

AI-based data analytics, meanwhile, will provide real-time, relevant updates for drivers to create a more enjoyable, efficient, and personalized driving environment and experience. In fact, according to ABIresearch, by 2030 consumers are expected to spend over 500 million hours annually using in-car video-on-demand apps.

The fusion of sensor data

Integrating various sensor data using AI is a critical advancement in automotive technology. This process, known as sensor fusion, combines data from multiple sensors to create a comprehensive understanding of the vehicle’s environment. This fusion not only improves the efficiency of data processing, but also enhances safety by providing meaningful insights to both drivers and passengers.

Sensor fusion technology allows autonomous vehicles to build a detailed model of their surroundings using data from RADAR, LiDAR, cameras, and ultrasonic sensors.

Meanwhile, AI-driven sensor fusion also enables personalized in-car experiences. By analyzing data from various sensors, vehicles can adjust settings, such as seat position, climate control, and infotainment preferences based on the driver’s habits and preferences.

Simon Teng, Senior Director of Automotive Partnerships at Arm, emphasizes the importance of integrating various sensor data using AI: “This fusion of information not only improves the efficiency of data processing but also enhances safety by providing meaningful insights to both drivers and passengers. The ability to process complex instructions and deliver personalized experiences marks a significant leap in automotive technology”.

By leveraging AI, vehicles can offer a more intuitive and seamless driving experience. For instance, AI can analyze data from in-cabin cameras to detect driver drowsiness and issue alerts or adjust the vehicle’s settings to keep the driver alert. This proactive approach to safety and comfort is a testament to the transformative potential of AI in the automotive industry.

The future of AI in Car

Looking ahead, AI advancements promise to significantly improve the in-vehicle experience. Mizuyama-san envisions a future where AI enhances comfort and hospitality, allowing cars to proactively offer suggestions and controls based on the inferred needs of drivers and passengers.

Vehicles will transform into versatile spaces that can adapt to various needs, such as becoming a mobile office or a relaxing environment. By leveraging AI, cars can create personalized experiences that make time spent in the vehicle more enjoyable and productive.

How Arm and Panasonic Automotive Systems are pioneering innovations

Both Arm and Panasonic Automotive Systems are at the forefront of automotive innovation, working together to push the boundaries of what is possible. Mizuyama-san shared Panasonic Automotive Systems’ vision to become the best “Joy in Motion” design company, focusing on eliminating the pains of mobility and enhancing the overall user experience.

To that effect, Arm and Panasonic Automotive Systems announced a partnership to help build a standardized architecture for software-defined vehicles (SDVs). This collaboration focuses on creating a flexible software stack to meet current and future automotive needs

Overall, the integration of AI in the automotive industry is not just about adding new features; it’s about transforming the entire driving experience. As AI technology continues to advance, we can expect even greater innovations that will redefine how we interact with our vehicles.

For a deeper dive into these insights, be sure to listen to the full podcast episode here: AI in the Car: A Look to the Future.

The post How is AI Being Used in Cars? appeared first on Arm Newsroom.

Arm
Arm Ethos-U85 NPU: Unlocking Generative AI at the Edge with Small Language Models 13 November 2024 at 16:30

Arm Ethos-U85 NPU: Unlocking Generative AI at the Edge with Small Language Models

Arm

By: Arm Editorial Team

13 November 2024 at 16:30

As artificial intelligence evolves, there is increasing excitement about executing AI workloads on embedded devices using small language models (SLM).

Arm’s recent demo, inspired by Microsoft’s “Tiny Stories” paper and Andrej Karpathy’s TinyLlama2 project, where a small language model trained on 21 million stories generates text, showcases endpoint AI’s potential for IoT and edge computing. In the demo, a user inputs a sentence, and the system generates an extended children’s story based on it.

Our demo featured Arm’s Ethos-U85 NPU (Neural Processing Unit) running a small language model on embedded hardware. While large language models (LLMs) are more widely known, there is growing interest in small language models due to their ability to deliver solid performance with significantly fewer resources and lower costs, making them easier and cheaper to train.

Implementing A Transformer-based Small Language Model on Embedded Hardware

Our demo showcased the Arm Ethos-U85 as a small, low-power platform capable of running generative AI, highlighting that small language models can perform well within narrow domains. Although TinyLlama2 models are simpler than the larger models from companies like Meta, they are ideal for showcasing the U85’s AI capabilities. This makes them a great fit for endpoint AI workloads.

Developing the demo involved significant modeling efforts, including the creation of a fully integer int8 (and int8x16) Tiny Llama2 model, which was converted to a fixed-shape TensorFlow Lite format suitable for the Ethos-U85’s constraints.

Our quantization approach has shown that fully integer language models can successfully balance the tradeoff between maintaining strong accuracy and output quality. By quantizing activation, normalization functions, and matrix multiplications, we eliminated the need for floating-point computations, which are more costly in terms of silicon area and energy—key concerns for constrained embedded devices.

The Ethos-U85 ran a language model on an FPGA platform at only 32 MHz, achieving text generation speeds of 7.5 to 8 tokens per second—matching human reading speed—while using just a quarter of its compute capacity. In a real system-on-chip (SoC), performance could be up to ten times faster, significantly enhancing speed and energy efficiency for AI processing at the edge.

The children’s story-generation feature used an open-source version of Llama2, running the demo on TFLite Micro with an Ethos-NPU back-end. Most of the inference logic was written in C++ at the application level. Adjusting the context window enhanced narrative coherence, ensuring smooth, AI-driven storytelling.

The team’s adaptation of the Llama2 model to run efficiently on the Ethos-U85 NPU required careful consideration of performance and accuracy due to the hardware limitations. Using mixed int8 and int16 quantization demonstrates the potential of fully integer models, encouraging the AI community to optimize generative models for edge devices and expand neural network accessibility on power-efficient platforms like the Ethos-U85.

Showcasing the Power of the Arm Ethos-U85

Scalable from 128 to 2048 MAC units (multiply-accumulate units), the Ethos-U85 achieves a 20% power efficiency improvement over its predecessor, the Ethos-U65. A standout feature of the Ethos-U85 is its native support for transformer networks, which earlier versions could not support.

The Ethos-U85 enables seamless migration for partners using previous Ethos-U NPUs, allowing them to capitalize on existing investments in Arm-based machine learning tools. Developers are increasingly adopting the Ethos-U85 for its power efficiency and high performance.

The Ethos-U85 can reach 4 TOPS (trillions of operations per second) with a 2048 MAC configuration in silicon. In the demo, however, a smaller configuration of 512 MACs on an FPGA was used to run the Tiny Llama2 small language model with 15 million parameters at just 32 MHz.

This capability highlights the potential for embedding AI directly into devices. The Ethos-U85 effectively handles such workloads even with limited memory (320 KB of SRAM for caching and 32 MB for storage), paving the way for small language models and other AI applications to thrive in deeply embedded systems.

Bringing Generative AI to Embedded Devices

Developers need better tools to navigate the complexities of AI at the edge, and Arm is addressing this with the Ethos-U85 and its support for transformer-based models. As edge AI becomes more prominent in embedded applications, the Ethos-U85 is enabling new use cases, from language models to advanced vision tasks.

The Ethos-U85 NPU delivers the performance and power efficiency required for innovative, cutting-edge solutions. Like the “Tiny Stories” paper, our demo represents a significant advancement in bringing generative AI to embedded devices, demonstrating the ease of deploying small language models on the Arm platform.

Arm is opening new possibilities for Edge AI across a wide range of applications, positioning the Ethos-U85 to power the next generation of intelligent, low-power devices.

Read how Arm is accelerating real-time processing for edge AI applications in IoT with ExecuTorch.

The post Arm Ethos-U85 NPU: Unlocking Generative AI at the Edge with Small Language Models appeared first on Arm Newsroom.

Arm
Equal1’s Quantum Computing Breakthrough with Arm Technology 13 November 2024 at 15:00

Equal1’s Quantum Computing Breakthrough with Arm Technology

Arm

By: Arm Editorial Team

13 November 2024 at 15:00

When you’re driving hard to disrupt quantum computing paradigms, sometimes it’s smart to chill out.

That’s Equal1’s philosophy. The Ireland-based company has notched another milestone on its journey deeper into the rapidly evolving field of quantum computing. Building on its success as winners of the “Silicon Startups Contest” in 2023, Equal1 has successfully tested the first chip incorporating an Arm Cortex processor at an astonishing temperature of 3.3 Kelvin (-269.85°C). That’s just a few degrees warmer than absolute zero, the theoretical lowest possible temperature where atomic motion nearly stops.

Equal1’s achievement is a crucial step in integrating classical computing components within the extremely power-constrained environment of a quantum cryo chamber. This brings the world closer to practical, scalable quantum computing systems. Cold temperatures reduce thermal noise that can cause errors in quantum computations and preserve quantum “coherence” – the ability of qubits to exist in multiple states simultaneously.

The Importance of Cryogenic Temperatures in Quantum Computing

What sets Equal1 apart in the quantum computing landscape is its pragmatic approach to quantum integration. Rather than creating entirely new infrastructure, Equal1’s vision was to build upon the foundation of the well-established semiconductor industry. This strategy became viable with the emergence of fully depleted silicon-on-insulator (FDSOI) processes, which the company’s founders recognized as having the potential to support quantum operations.

“Our thesis is that rather than tear up everything we’ve done and start anew, let’s try to build on top of what we’ve already built,” said Jason Lynch, CEO of Equal1. This philosophy has led to partnerships with industry leaders like Arm and NVIDIA, leveraging existing semiconductor expertise while pushing into quantum territory.

Cryo-Temperature Breakthrough

What makes this accomplishment particularly remarkable is the extensive engineering required to make it possible.

“There is no such thing as a Spice Kit that works, that predicts what silicon is going to do at 3 Kelvin,” said Brendan Barry, Equal1’s CTO. “In fact, there’s no such thing as a methodology, no libraries you can get to make it happen.”

Over five years, Equal1, which is part of the Arm Flexible Access program, developed its own internal Process Design Kit (PDK) and methodologies to predict and optimize logic behavior at cryogenic temperatures.

Equal1’s approach uses electrons or holes (the absence of electrons) as qubits, making their technology uniquely compatible with standard CMOS manufacturing processes. This choice wasn’t accidental; it’s fundamental to the company’s vision of creating practical, manufacturable quantum computers.

Arm silicon startup spotlight: Equal1

Working with commercial CMOS Fabs, Equal1 uses a standard process with proprietary design techniques developed over six years of research. These techniques enable operation at cryogenic temperatures while maintaining manufacturability.

“We’re not changing anything in the process itself, but we are certainly pushing the limits of what the process can do,” Barry said.

Integrating the Arm Cortex-A55 Processor

Building on this success, Equal1 is now setting its sights even higher. The company plans to incorporate the more powerful Arm Cortex-A55 processor into its next-generation Quantum System-on-Chip (QSoC). This ambitious project aims to have silicon available by mid-2025, the company said.

The integration of Arm technology is crucial not just for processing power, but for power efficiency. At cryogenic temperatures, power management becomes critical as any heat generated can affect the quantum states. Arm’s advanced power-management features make it an ideal choice for this challenging environment.

Equal1’s technology targets three primary application areas:

Chemistry and drug discovery, potentially reducing the current 15-year, $1.3 billion average cost of bringing new drugs to market.
Optimization problems in finance, logistics, and other fields requiring complex variable management.
Quantum AI applications, where quantum computing could dramatically improve efficiency.

Perhaps most revolutionary is Equal1’s approach to deployment. Unlike traditional quantum computers that require specialized facilities, Equal1 envisions rack-mounted quantum computers that can be installed in standard data centers at a fraction of the cost of current solutions.

“They just rack in like any other standard high-performance compute,” said Patrick McNally, Equal1’s marketing lead.

The Road Ahead for Quantum Computing and Equal1

Equal1’s progress brings the world closer to the reality of compact, powerful quantum computers that can be deployed in standard high-performance computing environments. The company’s integration of Arm technology at cryogenic temperatures opens new possibilities for quantum-classical hybrid systems, potentially creating increased demand for Arm adoption across the quantum computing industry.

As quantum computing continues to evolve, Equal1’s practical approach to integration with existing semiconductor technology and infrastructure could prove to be a game-changer. With applications ranging from drug discovery to financial modeling and beyond, the future of quantum computing is looking increasingly accessible and practical.

And that’s pretty cool.

The post Equal1’s Quantum Computing Breakthrough with Arm Technology appeared first on Arm Newsroom.

Arm
A New Game-Changer for Arm Linux Development in Automotive Applications 12 November 2024 at 23:30

A New Game-Changer for Arm Linux Development in Automotive Applications

Arm

By: Jason Andrews

12 November 2024 at 23:30

The rising adoption of advanced driver-assistance systems (ADAS), autonomous driving (AD) features, and software capabilities in software-defined vehicles (SDVs) is leading to growing computing complexities, particularly for software and developers. This has created a demand for more efficient, reliable, and powerful tools that streamline and strengthen the automotive development experience.

System76 and Ampere have responded to this need with Thelio Astra, an Arm64 developer desktop designed to revolutionize the Arm Linux development process for automotive applications. This innovative desktop offers developers the performance, compatibility, and reliability to push the boundaries of new and advancing automotive technologies.

Unlocking the potential of automotive software with Thelio Astra

Designed to meet the rigorous demands of ADAS, AD, and SDVs, the Thelio Astra uses the same architecture as Arm-based automotive electronic control units (ECUs). The architectural consistency ensures that the software developed for automotive applications runs efficiently on Arm-based systems without additional modifications.

This native-development environment provides faster, more cost-effective, and more power-efficient software testing, promoting safer roads with smarter prototypes. Moreover, by leveraging the same architecture in build and deployment environments, developers can streamline their processes by avoiding cross-compilation, which simplifies the build, test, and deployment environments.

Key benefits of Thelio Astra

Access to native performance: Developers can execute build and test cycles directly on Arm Neoverse processors, eliminating the performance overhead and complexities associated with instruction emulation and cross-compilation.

Improved virtualization: Familiar virtualization and container tools on Arm simplify the development and test process.

Better cost-effectiveness: Developers benefit from the ease of use and cost savings of having a local computer with a high core count, large memory, and plenty of storage.

Enhanced compatibility: Out-of-the-box support for Arm64 and NVIDIA GPUs eliminates the need for Arm emulation, which simplifies the developer process and overall experience.

Built for power efficiency: The system is engineered to prevent thermal throttling, ensuring reliable, sustained performance during the most intensive workloads, like AI-based AD and ADAS.

Advanced AI: Developers can build AI-based applications using frameworks, such as PyTorch on Arm, enabling powerful AI capabilities for automotive.

Optimized developer process: The development process can be optimized by enabling developers to run large software stacks on their local machine, making it easier to fix issues and improve performance.

Unrivaled ecosystem support: The robust and dynamic Arm software ecosystem for automotive offers a comprehensive range of tools, libraries, and frameworks to support the development of high-performance, secure, and reliable automotive software.

Accelerated time-to-market: Developers can create advanced software solutions without waiting for physical silicon, accelerating innovation and reducing development cycles.

Cutting-edge configuration for efficient automotive workloads

Thelio Astra is designed to handle intensive workloads. This is achieved through an advanced configuration with up to a 128-core Ampere® Altra® processor (3.0 GHz), 512GB of 8-channel DDR4 ECC memory (3200 MHz), an NVIDIA RTX 6000 Ada GPU, 8TB of PCIe 4.0 M.2 NVMe storage, and dual 25 Gigabit Ethernet SPF28. This setup guarantees that developers can tackle the most demanding tasks with ease, providing the performance and reliability that are essential for cutting-edge automotive development.

Driving Innovation with SOAFEE and Arm Neoverse V3AE

Thelio Astra will play a crucial role in the Scalable Open Architecture for Embedded Edge (SOAFEE) initiative, which aims to standardize automotive software development. By providing a native Arm64 development environment, Thelio Astra supports the SOAFEE reference stack, EWAOL, alongside other automotive software frameworks, with helping to accelerate innovation and shorten development cycles.

Thelio Astra also capitalizes on the momentum from the introduction of the Arm Neoverse V3AE, the first server-class CPU designed for the automotive market. The Neoverse V3AE delivers robust performance and reliability, making it essential for AI-accelerated AD and ADAS workloads.

Pioneering the future of automotive software development

Thelio Astra represents a significant leap forward in Arm Linux development for the automotive industry. By addressing the growing complexities of ADAS, AD, and SDVs, System76 and Ampere have created an indispensable tool with Thelio Astra. This will provide the compatibility needed for automotive target hardware, while delivering the performance developers expect from a developer desktop.

As the automotive landscape continues to evolve, tools like Thelio Astra will be essential in ensuring that developers have the resources they need to create the next generation of automotive applications and software.

Access the new learning path

Looking for more information? Here’s an introductory learning path for automotive developers interested in local development using the System76 Thelio Astra Linux desktop computer.

The post A New Game-Changer for Arm Linux Development in Automotive Applications appeared first on Arm Newsroom.

Arm
Arm Founding CEO Inducted into City of London Engineering Hall of Fame 8 November 2024 at 22:00

Arm Founding CEO Inducted into City of London Engineering Hall of Fame

Arm

By: Arm Editorial Team

8 November 2024 at 22:00

Sir Robin Saxby, the founding CEO and former chairman of Arm, has been inducted into the City of London Engineering Hall of Fame. The ceremony, which took place on the High Walkway of Tower Bridge in London on October 31, 2024, announced the induction of seven iconic engineers who are from or connected to the City of London.

As Professor Gordon Masterton, Past Master Engineer, said: “The City of London Engineering Hall of Fame was launched in 2020 and now has 14 inductees whose lives tell the story of almost 500 years of world-beating engineering innovations that have created huge improvements in the quality of life and economy of the City of London, the United Kingdom and the world. Our mission is to celebrate these role models of exciting and inspirational engineering careers.”

(Left to right) Sir Robin Saxby, The Lord Mayor of London Michael Mainelli, Professor Gordon Masterton

Saxby joined Arm full-time as the first CEO in February 1991 where he led the transformation of the company from a 12-person startup to one of the most valuable tech companies in the UK with a market capitalization of over $10 billion.

As CEO, Saxby was the visionary behind Arm’s highly successful business model, which has been adopted by many other companies across the tech industry. Through this innovative business model, the Arm processor can be licensed to many different companies for an upfront license fee, with Arm receiving royalties based on the amount of silicon produced.

This paved the way for Arm to become the industry’s highest-performing and most power-efficient compute platform, with unmatched scale today touching 100 percent of the connected global population.

Under Saxby’s tenure at Arm, power-efficient technology became the foundation of the world’s first GSM mobile phones that achieved enormous commercial success during the 1990s, including the Arm-powered Nokia 6110. Today, more than 99 percent of the world’s smartphones are based on Arm technology. The success in the mobile market gave the company the platform to expand into other technology markets that require leading power-efficient technology from Arm, including IoT, automotive and datacenter.

Saxby stepped down as CEO in 2001 and chairman of Arm in 2006. In 2002, he was knighted in the 2002 New Year Honors List. Saxby is a visiting Professor at the University of Liverpool, a fellow of the Royal Academy of Engineering and an honorary fellow of the Royal Society.

Thanks to Saxby’s work, Arm has grown to be a global leader in technology, with nearly 8,000 employees worldwide today. Just as Saxby and the 12 founding members had originally envisioned, Arm remains committed to developing technology that will power the future of computing.

The post Arm Founding CEO Inducted into City of London Engineering Hall of Fame appeared first on Arm Newsroom.

Arm
Pioneering People-centric Leadership at Arm 7 November 2024 at 22:59

Pioneering People-centric Leadership at Arm

Arm

By: Kirsty Gill

7 November 2024 at 22:59

The dynamic world of technology has experienced incredible change in the last decade alone, and in this new era of AI is moving faster than ever. The pace of innovation is unprecedented, driven by companies like Arm which employs nearly 8,000 people who are doing inspiring, innovative and important work to deliver the foundational Arm compute platform. In the two decades I’ve been at Arm, including eight years as Chief People Officer (CPO), I’ve had the opportunity to navigate the complexities of a rapidly evolving industry, a dynamic geo-political landscape, widening inequity across the globe, an increasing climate crisis and shifting expectations around DEI, ESG and the fundamentals of what work means. All the while, being a champion of a people-centric approach to ensure everyone at Arm can do their best work.

Now, it is time to step away and pursue the next phase of my life. I am delighted to be passing the baton to Charlotte Eaton, who is returning to Arm as the next CPO.

A passion for people and purpose

I’ve always believed that people do their best work when driven by a sense of purpose, shared values, sense of community and solving challenging and important problems. I’ve had the opportunity to help people navigate some of the most pivotal moments in Arm’s history and define a culture built around high engagement and high performance embedded into how we work, our workspaces and how we operate as a responsible and sustainable business.

I’ve also had the privilege of building a world-leading People Group focused on ensuring, across our business, that people are seen as more than resources, and our culture, organization, processes, technology, workspaces and approach to sustainability reflect this. Transparency and authenticity are at the core of how our company communicates and engages, and I firmly believe that putting people at the center of our decisions is not only the right way to treat people, but drives a stronger and more valuable business outcome. People do extraordinary things when we create the environment for them to do so.

Transformational leadership

I have served as CPO through some of Arm’s biggest changes, from taking the company public to private, then public again, through significant political and global challenges, the changing perception of what work means, a change in leadership and a step change in culture to enable a transformed business strategy. These transitions mean that two decades have been filled with new and interesting opportunities and the ability to deliver progressive people and workplace practices that ensure our people remain engaged, motivated and aligned with the company’s evolving goals and ambitions.

Together with my team we developed a shared culture and way of working, navigated periods of high growth, delivered thoughtful organizational changes, defined compelling reward propositions and implemented progressive policies around well-being, time-off and critical life events, so that people can perform to the highest level while also navigating life’s important moments.

And the success of these people strategies is shown across the organization. We currently have a growth rate of 15% per annum¹, attrition at an annual rate of under 5%¹, employee engagement at 84%², with 95%² of people being proud to work for Arm and 93%² feeling their work is valued, having an impact and aligned with the business strategy.

Putting people first and creating a culture that is inclusive, supportive, challenging and that cares about the world we inhabit has propelled Arm forward. There is an incredible opportunity ahead for the company and I know Charlotte will bring her passion, expertise and commitment as an extraordinary business and people leader to ensure our teams are ready and able to deliver.

I love Arm, and it has been a truly extraordinary place to work. I will remain Arm’s biggest supporter and look forward to seeing what more extraordinary things are to come as our people build the future of computing on Arm.

¹Data from Oct. 1, 2023-Sept. 30, 2024

²Data from Life at Arm Survey completed in October 2024.

The post Pioneering People-centric Leadership at Arm appeared first on Arm Newsroom.

Arm
What are the Latest Tech Innovations from Arm in October 2024? 1 November 2024 at 19:20

What are the Latest Tech Innovations from Arm in October 2024?

Arm

By: Arm Editorial Team

1 November 2024 at 19:20

As we move further into the era of advanced computing, Arm is continuing to lead the charge with groundbreaking tech innovations. October 2024 has been a month of significant strides in technology, particularly in AI, machine learning (ML), security, and system-on-chip (SoC) architecture.

The Arm Editorial Team has highlighted the cutting-edge tech innovations that happened at Arm in October 2024 – all to shape the next generation of intelligent, secure, and high-performing compute systems.

Enhancing AI, ML, and Security for Next-Gen SoCs with Armv9.6-A

Arm’s latest CPU architecture, Armv9.6-A, introduces key enhancements to meet evolving computing needs, focusing on AI, ML, security, and chiplet-based systems-on-chip (SoCs). Martin Weidmann, Director Product Management, discusses the latest features in the Arm A-Profile architecture for 2024.

The 2024 updates enhance Scalable Matrix Extension (SME) with structured sparsity and quarter-tile operations for efficient matrix processing while improving memory management, resource partitioning, secure data handling, and multi-chip system support.

Streamlining PyTorch Model Deployment on Edge Devices with ExecuTorch on Arm

Arm’s collaboration with Meta has led to the introduction of ExecuTorch, enhancing support for deploying PyTorch models on edge devices, particularly with the high-performing Arm Ethos-U85 NPU. Robert Elliott, Director of Applied ML, highlights how this collaboration enables developers to significantly reduce model deployment time and utilize advanced AI inference workloads with better scalability.

With an integrated GitHub repository providing a fully supported development environment, ExecuTorch simplifies compiling and running models, allowing users to create intelligent IoT applications efficiently. 

Accelerating AI with Quantized Llama 3.2 Models on Arm CPUs

Arm and Meta have partnered to empower the AI developer ecosystem by enabling the deployment of quantized Llama 3.2 models on Arm CPUs with ExecuTorch and KleidiAI. Gian Marco Iodice, Principal Software Engineer, details how this integration allows quantized Llama 3.2 models to run up to 20% faster on Arm Cortex-A CPUs, while maintaining model quality and reducing memory usage.

With the ExecuTorch beta release and support for lightweight quantized Llama 3.2 models, Arm is simplifying the development of AI applications for edge devices, resulting in notable performance gains in prefill and decode phases. 

Optimizing Shader Performance with Arm Performance Studio 2024.4

Arm’s latest Frame Advisor enhancement helps mobile developers identify inefficient shaders, boosting performance, memory usage, and power efficiency. Julie Gaskin, Staff Developer Evangelist, details the new features in Arm Performance Studio 2024.4, including support for new CPUs, improved Vulkan and OpenGL ES integration, and expanded RenderDoc debugging tools.  

This update provides detailed shader metrics – like cycle costs, register usage, and arithmetic precision – enabling developers to optimize performance and lower costs. 

Boosting Performance and Security for Arm Architectures with LLVM 19.1.0 

LLVM 19.1.0, released in September 2024, introduces nearly 1,000 contributions from Arm, including new architecture support for Armv9.2-A cores and performance improvements for data-center CPUs like Neoverse-V3. Volodymyr Turanskyy, Principal Software Engineer, highlights the features of LLVM 19.1.0, which deliver better performance and enhanced security.  
 
The update optimizes shader performance and Fortran intrinsics, adds support for Guarded Control Stack (GCS), security mitigations for Cortex-M Security Extensions (CMSE), enhancements for OpenMP reduction, function multi-versioning, and new command-line options for improved code generation.

Introducing System Monitoring Control Framework (SMCF) for Neoverse CSS

Arm’s System Monitor Control Framework (SMCF) streamlines sensor and monitor management in complex SoCs with a standardized software interface. Marc Meunier, Director of Ecosystem Development, highlights how it supports seamless integration of third-party sensors, flexible data sampling, and efficient data collection through DMA, reducing processor overhead.  
 
The SMCF enables distributed power management and improves system telemetry, offering insights for profiling, debugging, and remote management while ensuring secure, standards-compliant data handling.  

Achieving Human-Readable Speeds with Llama 3 70B on AWS Graviton4 CPUs  

AWS’s Graviton4 processors, built with Arm Neoverse V2 CPU cores, are designed to boost cloud performance for high-demand AI workloads. Na Li, ML Solutions Architect, explains how deploying the Llama 3 70B model on Graviton4 leverages quantization techniques to achieve token generation rates of 5-10 tokens per second.  

This innovation enhances cloud infrastructure, enabling more powerful AI applications and improving performance for tasks requiring advanced reasoning.  

Superior Performance on Arm CPUs with Pardiso Sparse Linear Solver

Panua Technologies optimized the Pardiso sparse linear solver for Arm CPUs, delivering significant performance gains over Intel’s MKL. David Lecomber, Senior Director Infrastructure Tools, highlights how Pardiso on Arm Neoverse V1 processors outperform MKL, demonstrating superior efficiency and scalability for large-scale scientific and engineering computations.  

This breakthrough positions Pardiso as a top choice for industries like automotive manufacturing and semiconductor design, offering unmatched speed and performance.  

Built on Arm Partner Stories

Vince Hu, Corporate Vice President, MediaTek, talks about the Arm MediaTek partnership, which drives ongoing tech innovation and delivers transformative technologies to enhance everyday life.

Eben Upton, CEO of Raspberry Pi, shares how the company has evolved from an educational tool to a key player in industrial and embedded applications, all powered by Arm technology. He highlights the development of new tools over the past decade and his personal journey with the BBC Microcomputer.

Clay Nelson, Industry Solutions Strategy Lead at GitHub, discusses the partnership between GitHub and Arm, which combines GitHub Actions with Arm native hardware to revolutionize software development, leading to faster development times and reduced costs.

Sy Choudhury from Meta Platforms Inc. explains how the collaboration with Arm is optimizing AI on the Arm Compute Platform, enhancing digital interactions through devices like AR smart glasses, and impacting everyday experiences with advanced AI applications.

Highlights from PyTorch Conference 2024

To accelerate the development of custom silicon solutions, Arm partners are tapping into the latest industry expertise and resources. Principal Software Engineer, Gian Marco Iodice discusses this in, “Democratizing AI: Powering the Future with Arm’s Global Compute Ecosystem,” from PyTorch Conference 2024.

Iodice highlights KleidiAI-accelerated demos, key AI tech innovations from cloud to edge, and the latest Learning Paths for developers.

The post What are the Latest Tech Innovations from Arm in October 2024? appeared first on Arm Newsroom.

Arm
Key Takeaways from OCP Global Summit 2024 18 October 2024 at 20:27

Key Takeaways from OCP Global Summit 2024

Arm

By: Arm Editorial Team

18 October 2024 at 20:27

One key message emerged from Open Compute Project (OCP) Global Summit 2024: AI scalability won’t progress unless we reduce design and development friction.

As the annual Silicon Valley event drew to a close, this crucial insight reverberated through the halls and informed discussions on the future of AI infrastructure and sustainable computing.

While everyone understands the enormous opportunities, they also know the AI scalability challenges:

Unprecedented demands on computing infrastructure with AI model parameter counts doubling every 3.4 months.
Global datacenter energy consumption is projected to triple by 2030.
The traditional approach to chip design and development is struggling to keep pace with these demands, both in terms of performance and energy efficiency.
As we push towards increasingly smaller process nodes, such as the current 3nm and the upcoming 2nm, the complexity and cost of manufacturing skyrocket. Design costs for a 2nm chip are estimated at a staggering $725 million.

These challenges are compounded by the long development cycles of monolithic chips, which can take years from concept to production – a timeframe that’s increasingly out of sync with the rapid evolution of AI workloads. Additionally, the one-size-fits-all approach of traditional chip design is ill-suited to the diverse and specialized computing needs of modern AI applications, leading to inefficiencies in both performance and energy use.

Arm, as a key player in the semiconductor industry, is at the forefront of addressing this challenge, and at OCP, the company showcased innovative solutions and collaborative approaches to overcome the hurdles in AI scalability.

Eddie Ramirez (far left, image below), VP of Go to Market, Infrastructure at Arm, highlighted this challenge in his executive session, emphasizing that “the insatiable demand for AI performance is putting immense pressure on today’s datacenters. We need a paradigm shift in how we approach silicon design and development to meet these growing needs sustainably.”

Bottom image, from left: Eddie Ramirez (VP of Go to Market, Arm), Melissa Massa (Lenovo Global Sales Lead for cloud service providers), Thomas Gardens (VP of Solutions, Supermicro), Moderator Bill Carter (OCP).

Arm’s Vision: Reducing Friction Through Innovation

At OCP 2024, Arm presented a comprehensive strategy to address the AI scalability challenge, centered around two key pillars: chiplet technology and ecosystem collaboration.

Chiplets, the next frontier in silicon innovation, were a central theme in Arm’s OCP presence, representing a revolutionary approach to chip design that promises to reduce development friction significantly. By breaking down complex chip designs into smaller, modular components, chiplets offer several advantages:

Lower development costs: Chiplets reduce manufacturing costs and improve yields compared to traditional monolithic designs.
Faster time-to-market: The modular nature of chiplets allows for quicker iteration and product launches.
Scalable performance: Companies can mix and match chiplets to create customized solutions for specific AI workloads.
Improved power efficiency: Chiplet designs enable more granular power management, crucial for sustainable AI infrastructure.

In Arm’s Expo Hall session, “Accelerating AI Innovation with Arm Total Design: A Case Study,” Arm experts collaborated with Samsung to demonstrate how Arm Neoverse Compute Subsystems (CSS) rapidly brought an AI chiplet to market. This real-world example showcased the practical benefits of chiplet technology in reducing design and development friction.

Arm Total Design: Fostering Ecosystem Collaboration

Recognizing that no single company can solve the AI scalability challenge alone, Arm touted its Arm Total Design initiative, which has doubled in size in just one year. This collaborative approach brings together over 50 industry partners – foundries, third-party IP and EDA tools, design services, and OEMs – to create a standardized Chiplet System Architecture, ensuring interoperability and reusability of chiplet components across the ecosystem.

At OCP 2024, Arm highlighted new and expanding elements of the Arm Total Design:

The introduction of new partners such as Egis, PUFsecurity, GUC, and Marvell, expanding the range of available chiplet solutions.
A showcase of diverse chiplet applications, from AI accelerators to networking and edge computing solutions.
The unveiling of a new AI CPU Chiplet Platform in collaboration with Samsung Foundry, ADTechnology, and Rebellions, promising approximately 3x power and performance efficiency compared to existing solutions.

All of these, taken in the aggregate, are meant to speed time to innovation, which is the name of the game.

Building a Sustainable AI Infrastructure

As we address the friction in AI development, sustainability remains a core focus. Arm’s approach to silicon design, particularly through chiplets and advanced process nodes, is crucial for building energy-efficient AI systems that can scale to meet future demands.

Arm’s initiatives for sustainable AI include demonstrating how Arm-based solutions can be optimized for specific AI workloads, improving overall system efficiency and collaborating on system firmware and chiplet standards to ensure interoperability and reduce development complexity, all contributing to AI scalability.

The Path Forward: Scaling AI Through Collaboration

The future of AI infrastructure lies in collaborative innovation. The challenges of scaling AI capabilities while maintaining energy efficiency are significant, but the advancements in chiplet technology and ecosystem-wide collaboration are paving the way for a more sustainable and scalable AI future.

Arm is committed to leading this transformation, providing the foundation for sustainable AI infrastructure through our Neoverse technology, Arm Total Design initiative, and extensive ecosystem partnerships.

So, what’s next?

Expanded ecosystem collaboration: We’ll continue to grow the Arm Total Design network, bringing more partners into the fold to accelerate chiplet innovation and reduce development friction.
Advanced AI optimizations: Expect to see more Arm-based solutions specifically tailored for AI training and inference workloads, designed to scale efficiently with growing demands.
Sustainability-first approach: Energy efficiency will remain at the forefront of our design philosophy, helping our partners build more sustainable datacenter solutions that can support the exponential growth of AI.

OCP Global Summit 2024 served as a powerful reminder that the future of AI depends on our ability to innovate at the silicon level. By reducing design and development friction through chiplet technology and ecosystem collaboration, we’re opening new possibilities for scalable, sustainable AI infrastructure.

Eddie Ramirez put it best in his summit presentation: “The AI datacenter of tomorrow is being built today, and it’s being built on Arm.” With our technology at its core and our partners by our side, we’re not just scaling AI – we’re shaping a more efficient, sustainable future for computing.

The post Key Takeaways from OCP Global Summit 2024 appeared first on Arm Newsroom.

Arm
XR, AR, VR, MR: What’s the Difference in Reality? 17 October 2024 at 16:56

XR, AR, VR, MR: What’s the Difference in Reality?

Arm

By: Arm Editorial Team

17 October 2024 at 16:56

eXtended Reality (XR) is a term for technologies that enhance or replace our view of the world. This is often done by overlaying or immersing digital information and graphics into real-world and virtual environments, or even a combination of both, a process also known as spatial computing.

XR encompasses augmented reality (AR), virtual reality (VR), and mixed reality (MR). While all three ‘realities’ share overlapping features and requirements, each technology has different purposes and underlying technologies.

XR is set to play a fundamental role in the evolution of personal devices and immersive experiences, from being a companion device to smartphones to standalone Arm-powered wearable devices, such as a VR headset or pair of AR smartglasses where real, digital and virtual worlds converge into new realities for the user.

Video: What is XR?

While XR devices vary based on the type of AR, MR, and VR experiences and the complexity of the use cases that they are designed to enable, the actual technologies share some fundamental similarities. A core part of all XR wearable devices is the ability to use input methods, such as object, gesture, and gaze tracking, to navigate the world and display context-sensitive information. Depth perception and mapping are also enabled through the depth and location features.

What are the advantages and challenges with XR?

XR technologies offer several advantages compared to other devices, including:

Enhanced interaction through more natural and intuitive user interfaces;
More realistic simulations for training and education;
Increased productivity with virtual workspaces and remote collaboration;
More immersive entertainment experiences; and
Alternative interaction methods for people with disabilities.

However, there are still challenges that need to be overcome with XR devices. These include:

The initial perception around bulky and uncomfortable hardware.
Limited battery life for untethered devices.
Complex and resource-intensive content creation.
The need for low latency, and high performance.
Ensuring data privacy and security for the end-user.

What is augmented reality (AR)?

Augmented Reality enhances our view of the real world by overlaying what we see with computer-generated information. Today, this technology is prevalent in smartphone AR applications that require the user to hold their phone in front of them. By taking the image from the camera and processing it in real time, the app can display contextual information or deliver gaming and social experiences that appear to be rooted in the real world.

Smartphone AR has improved significantly in the past decade, with some great examples being Snapchat where users can apply real-time face filters via AR, and IKEA place where users can visualize furniture in their homes through AR before making a purchase. However, the breadth of these applications remains limited. Increasingly, the focus is on delivering a more holistic AR experience through wearable smart glasses. These devices must combine an ultra-low-power processor with multiple sensors, including depth perception and tracking, all within a form factor that is light and comfortable enough to wear for long periods.

AR smart glasses need always-on, intuitive, and secure navigation while users are on the move. This requires key advancements in features such as depth, 3D SLAM, semantics, location, orientation, position, pose, object recognition, audio services, and gesture and eye tracking.

All these advancements and features will require supporting AI and machine learning (ML) capabilities on top of traditional computer vision (CV). In fact, new compact language models, which are designed to run efficiently on smaller devices, are becoming more influential across XR wearable technologies. These models enable real-time language processing and interaction, which means XR wearable devices can understand and respond to natural language in real time, allowing users to interact with XR applications using real-time voice commands.

Since 2021, several smart glasses models have arrived on the market, including the Spectacle smartglasses from Snap, Lenovo ThinkReality A3, and in 2024, the Ray-Ban Meta Smart Glasses, Amazon Echo Frames, and, most recently, Meta’s Orion smartglasses. All of the devices are examples of how XR wearables are evolving to provide enhanced capabilities and features, like advanced AR displays and real-time AI video processing.

What is virtual reality (VR)?

VR completely replaces a user’s view, immersing them within a computer-generated virtual environment. This type of XR technology has existed for a while, with gradual improvements over the years. It is used primarily for entertainment experiences, such as gaming, concerts, films, or sports but it’s also accelerating into the social domain. For VR, the immersive entertainment experiences will require capabilities like an HD rendering pipeline, volumetric capture, 6DoF motion tracking, and facial expression capture.

VR is also used as a tool for training and in education and healthcare, such as rehabilitation. To make these experiences possible (and seamless) for the end-user, the focus of VR technology is often on high-quality video and rendering and ultra-low latency.

Finally, VR devices started enhancing video conferencing experiences through platforms like Meta’s Horizon Workrooms that enable virtual meet-ups in different virtual worlds.

Standalone VR devices, such as the latest Meta Quest 3, can deliver AAA gaming and online virtual worlds experiences. Powered by high-end Arm processors, these standalone VR devices can be taken anywhere.

What is mixed Reality (MR)?

MR sits somewhere between AR and VR, as it merges the real and virtual worlds. There are three key scenarios for this type of XR technology. The first is through a smartphone or AR wearable device with virtual objects and characters superimposed into real-world environments, or potentially vice versa.

The Pokémon Go mobile game, which took the world by storm back in 2016, overlays virtual Pokémon in real-world environments via a smartphone camera. This is often touted as a revolutionary AR game, but it’s actually a great example of MR – blending real-world environments with computer-generated objects.

MR is revolutionizing the way we experience video games by enabling the integration of real-world players into virtual environments. This technology allows VR users to be superimposed into video games, creating a seamless blend of physical and digital worlds. As a result, real-world personalities can now interact within the game itself, enhancing the immersive experience for both players and viewers.

This innovation is particularly impactful for streaming platforms like Twitch and YouTube. Streamers can now bring their unique personalities directly into the game, offering a more engaging and interactive experience for their audience. Viewers can watch their favorite streamers navigate virtual worlds as if they were part of the game, blurring the lines between reality and the digital realm. By incorporating MR, streamers can create more dynamic and visually captivating content, attracting larger audiences and fostering a deeper connection with their fans. This technology not only enhances the entertainment value but also opens up new possibilities for creative expression and storytelling in the gaming community.

XR is becoming more mainstream

With more XR devices entering the market, XR is becoming more affordable, with the technology transitioning from tech enthusiasts to mainstream consumers.

XR is increasing the immersive experience, by adding more sensory inputs, integrating with more wearable technologies, and using generative AI to create faster and more realistic and interactive virtual environments for collaboration and meeting spaces, such as the Meta Horizon OS that will provide workspaces for regular work. This makes XR technologies more accessible and universally adopted in more markets beyond gaming, including:

Education e.g. Immersive simulation, exploring historical events virtually, virtual experiments, and augmenting museum tours
Healthcare e.g. More realistic medical training, removing the need for physical consultations, therapeutic applications, and AR-assisted surgeries
Retail: e.g. Virtual clothes fittings, product visualization, virtual shopping tours.
Industrial: e.g. Interactive training programs.

XR is continuously evolving, increasingly at a faster pace thanks to the support of AI. This offers new ways to blur the frontier between virtual and physical worlds.

Advancing XR Experiences

Arm focuses on developing technology innovations that power the next generation of XR devices. Arm CPU and GPU technology delivers many benefits:

Efficient performance: Arm’s leadership in high-performance, low-power specialized processors is ideal for XR experiences. This includes the Cortex-X and Cortex-A CPUs as part of the new Arm Compute Subsystem (CSS) for Client that can be used in silicon solutions for wearables and mobile devices, providing the necessary compute capabilities for immersive XR experiences. These can also be combined with accelerator technologies like Ethos-U NPUs that can with Cortex-A-based systems to deliver accelerated AI performance.
Graphics capabilities: Arm’s Immortalis and Mali GPUs deliver exceptional graphics performance and efficiency for XR gaming experiences.
Real-Time 3D Technology: The pivot to more visually immersive real-time 3D mobile gaming content is at the heart of the Immortalis GPU. This technology brings ray tracing and variable rate shading to deliver more realistic mobile real-time 3D experiences.
Security and AI: Arm’s built-in technology features for security and AI are essential for the ongoing development of next-generation XR devices.
Strong software foundations: In many instances, companies that are developing XR applications are using versions of the Android Open Source Project as their base software. This ensures that they benefit from years of software investment from Arm, allowing their software enablement efforts to scale across a wide range of XR devices.

Through the Arm Compute Platform, we are providing a performant, efficient, secure, and highly accessible solution with advanced AI capabilities that meet the needs of XR devices and experiences, now and in the future. Recent technology developments have shown that mainstream XR could be coming soon, with Arm’s technologies ideally placed to deliver truly immersive experiences for this next future of computing.

Advancing AR and VR Experiences

Arm focuses on developing technology innovations that power the next generation of XR devices. Arm CPU and GPU technology delivers a number of benefits, including improved performance and increased power efficiency.

Learn More

The post XR, AR, VR, MR: What’s the Difference in Reality? appeared first on Arm Newsroom.

Arm
Cloud Efficiency and Performance: Arm Neoverse-powered Microsoft Azure Cobalt 100 VMs Now Available 17 October 2024 at 01:00

Cloud Efficiency and Performance: Arm Neoverse-powered Microsoft Azure Cobalt 100 VMs Now Available

Arm

By: Bhumik Patel

17 October 2024 at 01:00

Cloud users and developers are constantly seeking efficient and sustainable compute solutions that can scale for modern, cloud-native applications, including AI. Microsoft, a leader in this space, has been addressing these needs through system-to-software optimization across their Azure offerings. Today, we’re thrilled to see Microsoft announce the general availability of new Azure Virtual Machines (VMs) powered by the Arm Neoverse Compute Subsystems (CSS) N2-based Azure Cobalt 100 processor. This marks a significant milestone in our long-standing collaboration with Microsoft, offering a robust platform that delivers exceptional performance, scalability, and a thriving software ecosystem for diverse workloads.

Microsoft has played a key part in the adoption of Arm Neoverse, showcasing the price-performance and power efficiency gains of Arm-based platforms as an early adopter into their Microsoft Azure VM offerings. The availability of Azure Cobalt 100-powered VMs based on CSS N2 brings industry-leading price-performance compute instances to millions of users. Azure Cobalt 100 leverages the benefits delivered through the Neoverse CSS platform and a robust software ecosystem developing on Arm, allowing Microsoft more time to focus on adding unique innovation and optimization while saving significant development effort.

The new VMs outlined in this announcement blog from Microsoft include the Dpsv6 and Dplsv6 series general-purpose VMs and the Epsv6 memory-optimized VMs and are available in multiple regions.

Arm Neoverse Software Ecosystem Advantage

Cloud developers are building their applications on Arm Neoverse platforms and benefiting from performance, efficiency, cost savings and sustainability gains. For developers, this is made possible by having a mature software ecosystem support across the software stack and strong open source and ISV enablement in place.

The Arm software ecosystem has native support across all major Linux OS distributions, container runtimes and orchestration, languages and libraries, CI/CD and AI/ML frameworks. For example, cloud native developers can natively build, test and deploy their Linux and Windows based applications with GitHub Actions achieving 37% better cost efficiencies. The broad range of ISVs and open source projects including cloud workloads supported is listed on the Software Ecosystem Dashboard.

Running cloud workloads on Cobalt 100 VMs such as web servers, Java-based applications, databases, HPC and EDA workloads provide significant price-performance benefits as showcased below:

High Performance Computing (HPC) Workloads on Azure Cobalt 100

With every major application and framework in the open-source HPC ecosystem and the leading commercial simulations from companies such as Altair, ANSYS and Siemens supporting Arm architecture, the Azure Cobalt 100 is ready to deploy HPC workloads.

Across every HPC workload, the improvement on the Azure Cobalt 100 from the previous generation Neoverse N1 instances is impressive – with performance uplifts of 50-90%, and performance per dollar increased between 60-110%..

Rescale, a pioneer of HPC solutions in the cloud, provides their leading platform for engineers, scientists, and innovators to accelerate product development and deliver engineering breakthroughs through high performance computing, R&D data cloud, and enabling applied AI. Rescale is an early adopter of Azure Cobalt 100, enabling them to support their customers with the broadest selection of R&D applications fully integrated with Azure Cobalt 100. Here is Joris Poort, CEO of Rescale explaining the benefits of HPC workloads on Azure Cobalt 100.

EDA Workloads on Azure Cobalt 100

Today, we are happy to share that Arm also is an early adopter of Azure Cobalt 100 and has already seen significant performance and efficiency gains when deploying internal EDA workloads. In testing, we’ve seen a 1.5-2x acceleration compared to the previous generation Neoverse N1-based Azure VMs for key workloads. By harnessing the combined strengths of Arm’s processor technology and Azure’s cloud infrastructure, we are committed to empowering our customer ecosystem to accelerate innovation and bring groundbreaking products to market faster.

Arm Resources: Empowering Your Cloud Journey

To help you get the most out of Azure Cobalt 100, we’ve prepared a number of resources:

Migration to Cobalt 100 with Arm Learning Paths: Streamline your transition to Cobalt 100 instances with comprehensive guides and best practices.
Arm Software Ecosystem Dashboard: Stay up-to-date on the latest software support for Arm.
Arm Developer Hub: Whether you are just getting started on Arm or looking for resources to help you create top-performing software solutions, Arm Developer Hub has everything you need to start building better software and deliver rich experiences for billions of devices. Download, learn, connect, and question within our growing global developer community.

The general availability of Azure Cobalt 100-based cloud instances marks a pivotal moment in the evolution of cloud computing. Embrace the power, efficiency, and flexibility of Arm Neoverse CSS and experience a new level of performance for your workloads. Visit the Microsoft Azure Portal to launch Cobalt 100 VMs for your workloads today!

The post Cloud Efficiency and Performance: Arm Neoverse-powered Microsoft Azure Cobalt 100 VMs Now Available appeared first on Arm Newsroom.

Arm
Why Arm is the Global Semiconductor Ambassador 15 October 2024 at 16:00

Why Arm is the Global Semiconductor Ambassador

Arm

By: anonymous

15 October 2024 at 16:00

By Vince Jesaitis, Head of Global Government Affairs, and Peter Stephens, Director, Government Partnerships (UK), Arm

As the foundational computing platform for much of the world’s technology, Arm has a unique understanding of the global, interconnected and highly specialized semiconductor supply chain. Given this central role in building the future of computing, Arm has become a critical partner and advisor for governments worldwide. Many governments are implementing policies and support for advancing areas like artificial intelligence (AI), cybersecurity and more power-efficient computing infrastructure, as well as education and training to ensure students have the knowledge and skills to enter the future semiconductor workforce.

At a recent G7 Semiconductor Experts Working Group that Arm was asked to host by the UK government, these topics were discussed. Representatives from Canada, the European Union, France, Germany, Italy, Japan, the United Kingdom, and the United States came to Arm’s global headquarters in Cambridge, along with industry representatives from those countries and regions. Following opening remarks from Lord Patrick Vallance and Arm Chief Architect Richard Grisenthwaite, the group deliberated on common challenges the industry is facing, and ultimately how we can work together with the industry to advance those areas.

The G7 Semiconductor Experts Working Group at Arm’s global headquarters in Cambridge, UK

As Richard previewed in his opening remarks, these challenges need collective action, particularly around the provenance of intellectual property (“IP”) used in the design of semiconductors and its impact on trustworthiness.

Arm’s security DNA

Security has always been in Arm’s DNA, with over 30 years of innovation focused on technologies to mitigate the ongoing evolution of security threats. Richard has written previously about the importance of computing security in the age of AI.

He said “[s]ecurity is the greatest challenge computing needs to address to meet its full potential”. The challenge is compounded by the optimism bias we find among end users who assume that devices are “100% secure” simply because they are on the market. As our lives become more connected, policy makers have identified the challenge articulating what security looks like to the mass market and how to “deliver a world that many feel we already live in.”

There are examples of governments worldwide implementing various security initiatives. Singapore introduced a security labelling scheme for consumers, the US implemented laws to leverage federal procurement as a lever for change, international statements were published and the UK government developed the Product Security and Telco Infrastructure Act to prohibit the sale of consumer IoT products that do not meet basic security requirements, and we expect more to come in this area.

Policymakers always need to strike the balance between driving innovation, but also protecting people from harm by driving more adoption for better practices around security. The government can have a strong role enforcing these security practices if the industry is unable to encourage more active adoption. At the same time, technology innovation from the industry can be used to improve and advance government security policies.

These include the foundational security technologies that are already available, like Arm’s Memory Tagging Extension (MTE), which is built into the latest Armv9 architecture. MTE is important because it allows companies and developers to mitigate memory safety issues, which account for 70 percent of all serious security bugs. Memory safety is a significant security issue that has been recognized by the US government, with the White House Office of the National Cyber Director releasing a report on this class of vulnerabilities entitled “Back to the Building Blocks”. The report cited existing solutions that require broader industry update to be effective, calling out MTE among others. MTE is already being embraced by the mobile market, with Google enabling MTE in Android 14.

While plenty of impressive, thoughtful policies and initiatives exist across individual countries, there remains no “silver bullet” for security improvements worldwide. It is a complex space, requiring a lot of work across many different areas. However, initiatives like PSA Certified do a great job at ensuring global partnership around building security best practices and democratizing the adoption of security across the industry. It has quickly become the fastest growing ecosystem, uniting industry, standards bodies and policy makers to build the foundational trust required in connected devices.

Power-efficient AI

Another ongoing topic at the forefront of every government worldwide is how to manage the increasing power demands in the age of more advanced compute and AI. Looking ahead, AI workloads will continue to challenge long-term sustainability – both for businesses and the planet – unless the world embraces power-efficient processing technologies.

Just like security, Arm has a heritage in power efficiency. This dates back to the world’s first GSM cell phones of the mid-90s that were powered by Arm processors. Now and in the future, we believe that it’s vital for power-efficiency to be applied across the spectrum of technologies, from smartphones used by the world’s consumers to data centers processing vast amounts of AI workloads.

Finding ways to limit the power consumption of these large data centers is paramount, especially when you consider that the world’s data centers require 460 terawatt-hours (TWh) of electricity annually, which is the equivalent to the entire county of Germany. Managing these energy costs will require significant investments into power-efficient AI computing from governments worldwide.

Fortunately, we know that much of industry is already demanding efficient AI. The world’s leading technology companies, including Amazon, Google, Microsoft and Oracle are turning to Arm to adopt our power-efficient technology for their silicon designs for cloud data centers. This is crucial as governments worldwide back the building of new data centers for advanced AI processing in their own countries.

The global semiconductor skills shortage

Finally, the semiconductor industry is experiencing a global skills shortage. In the US alone, industry estimates show that there will be a shortage of around 30,000 engineers and 90,000 technical workers by 2030. There are key challenges: we need to ensure science and computer science are accessible and that those with an interest in computing are inspired to pursue careers designing the brains that drive the building blocks of technology.

This is part of the reason why Arm announced the Semiconductor Education Alliance in July 2023. The initiative brings together key stakeholders worldwide across industry, academia and government, with the intention of creating the semiconductor workforce of the future through educating a new generation of talent and upskilling the existing workforce. Arm has also published a competency framework, outlining the knowledge, skills and attributes needed to build this future semiconductor workforce.

The epicenter of the global technology ecosystem

In our role at the epicenter of the global technology ecosystem, Arm has a unique perspective that can help to steer conversations about the future of technology. Our heritage in security and power efficiency is today more relevant than ever, as the new age of AI presents increasingly complex challenges for users, governments and industry. Collective action among industry, government and researchers will lead to sustainable, safe and secure AI that will benefit us all.

The post Why Arm is the Global Semiconductor Ambassador appeared first on Arm Newsroom.

Arm
Why Developers are Migrating to Arm 14 October 2024 at 22:00

Why Developers are Migrating to Arm

Arm

By: Arm Editorial Team

14 October 2024 at 22:00

As a developer, you know how crucial it is to build applications that scale efficiently while keeping costs down. As the cloud landscape evolves, so does the technology running behind the scenes. In recent years, more and more companies are discovering the advantages of migrating their applications from x86-based architectures to Arm. With significant performance gains and lower total costs of ownership, Arm is quickly becoming the go-to architecture for companies looking to future-proof their workloads.

Discover how to easily migrate your applications to Arm for better performance and cost savings. Our Arm Migration Guide will help you ensure a smooth transition for containerized workloads, cloud-managed services, and Linux applications.

The Power of Arm: Performance and Efficiency

Arm processors, like those in AWS Graviton, Google Axion, and Microsoft Azure’s Ampere-based offerings, are designed to deliver superior performance at lower costs. With up to 60% energy savings and a 50% boost in performance, migrating to Arm-based cloud instances opens new opportunities for developers looking to optimize their workloads. Arm also offers a higher density of cores, which translates into improved scalability and the ability to handle more tasks simultaneously.

Moreover, Arm’s architecture is designed with flexibility in mind, allowing you to future-proof your development. Once you migrate your workloads to Arm, they are compatible across multiple cloud providers, giving you the agility to scale your applications on any Arm-based cloud platform, including AWS, Google Cloud, and Microsoft Azure.

Port to Arm Once, Open up our Entire Cloud Ecosystem and Workflows

The growing adoption of Arm-based solutions by major cloud providers has spurred increased software compatibility and optimization, making it easier for developers to leverage Arm’s strengths. For AI workloads specifically, Arm’s focus on specialized processing elements and heterogeneous computing allows for efficient execution of machine learning algorithms. This combination of power efficiency, scalability, and AI acceleration capabilities positions the Arm ecosystem as a compelling choice for organizations looking to optimize their cloud infrastructure and AI applications.

Customer Success on Arm

Honeycomb.io and FusionAuth both demonstrate how easy and beneficial it is to migrate to an Arm-based infrastructure.

Honeycomb.io Reduces Infrastructure Costs by 50%
A leader in the observability space, Honeycomb transitioned from a legacy architecture to Arm-based AWS Graviton processors to handle its massive data processing needs. The results were immediate and striking. Honeycomb realized a 50% reduction in infrastructure costs while maintaining high performance and using fewer instances. This migration allowed Honeycomb to focus on what they do best—providing deep insights into system behavior—without worrying about spiraling infrastructure costs.
FusionAuth Increases Logins Per Second up to 49%
Migrating to Arm wasn’t just an experiment—it was a breakthrough. After load testing on Arm-based AWS Graviton instances, FusionAuth saw a 26% to 49% increase in logins per second compared to legacy systems. A seamless transition, the company also achieved 8% to 10% cost savings along the way. FusionAuth now runs the majority of its cloud infrastructure on Arm-based instances, enabling them to support a wide range of use cases from IoT to high-performance cloud platforms.

The Path to Migration: It’s Easier Than You Think

Migrating from legacy architectures to Arm is a smooth process that doesn’t require a complete code overhaul. Companies like Honeycomb and FusionAuth successfully made the transition using Arm’s strong ecosystem of developer tools and support for adapting code, testing, debugging, and optimizing performance. Whether you’re running Java, Golang, or other popular languages, Arm provides compatibility with your existing tech stack. The flexibility of Arm’s architecture ensures that your applications perform better with fewer resource demands, leading to improved price-performance ratios.

Developers should start by assessing their current software stack, including operating systems, programming languages, development tools, and dependencies. Next, they should set up a development environment that supports Arm architecture, which can be done using emulation, remote hardware, or physical Arm hardware. The migration process typically involves recompiling applications written in compiled languages like C/C++, Go, and Rust, while interpreted languages such as Python, Java, and Node.js may require minimal changes.

Developers should also ensure that all necessary libraries and dependencies are available for Arm. Testing and validation are crucial steps to identify and resolve any compatibility issues. Finally, developers can deploy their Arm-compatible workloads to cloud platforms like AWS, Google Cloud, and Microsoft Azure, which offer robust support for Arm-based instances.

Whether you are working on battery-powered devices, embedded systems, or IoT applications, migrating to Arm is a strategic decision that provides cost savings, superior performance, and sustainability. Developers around the world are choosing Arm to build more reliable, scalable, and power efficient applications.

Ready to make the move?

Learn how you can migrate your workloads seamlessly with our Arm Migration Guide and start building for a better future on Arm.

Access Guide

The post Why Developers are Migrating to Arm appeared first on Arm Newsroom.

Arm
How Arm Neoverse can Accelerate Your AI Data Center Dreams 12 October 2024 at 04:30

How Arm Neoverse can Accelerate Your AI Data Center Dreams

Arm

By: Arm Editorial Team

12 October 2024 at 04:30

In the rapidly evolving landscape of cloud computing and AI, businesses need ways to optimize performance, reduce costs and stay ahead of the competition. Enter Arm Neoverse – a game-changing architecture that’s reshaping the future of AI data centers and AI infrastructure.

Arm Neoverse has emerged as the go-to choice for industry leaders looking to drive innovation while minimizing total cost of ownership (TCO) in their AI data centers. With its unmatched performance, scalability, and power efficiency, Arm Neoverse is redefining what’s possible in modern computing environments.

Cloud Giants Lead the Way

It’s not just hype – major cloud service providers are already harnessing the power of Arm Neoverse:

These trail-blazers recognize that Arm Neoverse delivers the critical performance and efficiency needed to meet today’s most demanding workloads. But the benefits of Arm aren’t limited to the cloud.

Bringing Arm to Your Data Center

Enterprise customers can now leverage Arm technology on-premises, thanks to trusted OEM partners like HPE and Supermicro. As AI capabilities become increasingly central to enterprise applications, Arm-based solutions offer a path to consolidate outdated x86 servers onto fewer, high-performance, power-efficient machines.

We understand that considering and adopting new technology can seem daunting. That’s why we’ve created a comprehensive guide to help you navigate your Arm Neoverse journey. Our guide addresses common concerns, showcases real-world success stories, and provides a clear path forward for enterprises looking to harness the power of Arm for their AI data centers.

Click to read Accelerate Your AI Data Center Dreams

Ready to accelerate your AI data center dreams? Learn how Arm Neoverse can transform your infrastructure and propel your business into the future of AI-driven computing.

Don’t just keep pace – lead the AI revolution with Arm Neoverse.

Arm in the data center

Learn more about how Arm technology drives AI data center innovation.

Download the guide

The post How Arm Neoverse can Accelerate Your AI Data Center Dreams appeared first on Arm Newsroom.

Arm
Why Arm is the Compute Platform for All AI Workloads 10 October 2024 at 02:25

Why Arm is the Compute Platform for All AI Workloads

Arm

By: Arm Editorial Team

10 October 2024 at 02:25

For AI, no individual piece of hardware or computing component will be the “one size fits-all” solution for all workloads. AI needs to be distributed across the entire modern topography of computing, from cloud to edge – and that requires a heterogeneous computing platform that offers the flexibility to use different computational engines, including the CPU, GPU and NPU, for different AI use cases and demands.

The Arm CPU already provides a foundation for accelerated AI everywhere, from the smallest embedded device to the largest datacenter. This is due to its performance and efficiency capabilities, pervasiveness, ease of programmability and flexibility.

Focusing on flexibility, there are three key reasons why this is hugely beneficial to the ecosystem. Firstly, it means the Arm CPU can process a broad range of AI inference use cases, many of which are commonly used across billions of devices, like today’s smartphones, and in cloud and data centers worldwide – and not only that, because beyond inference the CPU is often used for additional tasks in the stack, such as data pre-processing and orchestration. Secondly, developers can run a broader range of software in a greater variety of data formats without needing to build multiple versions of the code. And, thirdly, CPU’s flexibility makes it the perfect partner for accelerated AI workloads.

Delivering diversity and choice to enable the industry to deploy AI compute their way

Alongside the CPU portfolio, the Arm compute platform includes AI accelerator technologies, such as GPUs and NPUs, which are being integrated with the CPU across various markets.

In mobile, Arm Compute Subsystems (CSS) for Client features the Armv9.2 CPU cluster integrated with the Arm Immortalis-G925 GPU to offer acceleration capabilities for various AI use cases, including image segmentation, object detection, natural language processing, and speech-to-text. In IoT, the Arm Ethos-U85 NPU is designed to run with Cortex-A-based systems that require accelerated AI performance, such as factory automation.

Also, in addition to Arm’s own accelerator technologies, our CPUs give our partners the flexibility to create their own customized, differentiated silicon solutions. For example, NVIDIA’s Grace Blackwell and Grace Hopper superchips for AI-based infrastructure both incorporate Arm CPUs alongside NVIDIA’s AI accelerator technologies to deliver significant uplifts in AI performance.

The Grace Blackwell superchip combines NVIDIA’s Blackwell GPU architecture with the Arm Neoverse-based Grace CPU. Arm’s unique offering enabled NVIDIA to make system-level design optimizations, reducing energy consumption by 25 times and providing a 30 times increase in performance per GPU compared to NVIDIA H100 GPUs. Specifically, NVIDIA was able to implement their own high-bandwidth NVLink interconnect technology, improving data bandwidth and latency between the CPU, GPU and memory – an optimization made possible thanks to the flexibility of the Arm Neoverse platform.

Click to read Accelerate Your AI Data Center Dreams

Arm is committed to bringing these AI acceleration opportunities across the ecosystem through Arm Total Design. The program provides faster access to Arm’s CSS technology, unlocking hardware and software advancements to drive AI and silicon innovation and enabling the quicker development and deployment of AI-optimized silicon solutions.

The Arm architecture: Delivering the unique flexibility AI demands

Central to the flexibility of the Arm CPU designs is our industry-leading architecture. It offers a foundational platform that can be closely integrated with AI accelerator technologies and supports various vector lengths, from 128 bit to 2048 bit, which allows for multiple neural networks to be executed easily across many different data points.

The flexibility of the Arm’s architecture enables diverse customization opportunities for the entire silicon ecosystem, with our heritage built on enabling partners to build their own differentiated silicon solutions as quickly as possible. This unique flexibility also allows Arm to continuously innovate the architecture, introducing critical instructions and features on a regular cadence that accelerate AI computation to benefit the entire ecosystem, from leading silicon partners to the 20 million plus software developers building on the Arm compute platform.

This started with the Armv7 architecture, which introduced advanced Single Instruction Multiple Data (SIMD) extensions, such as NEON technology, as Arm’s initial venture into machine learning (ML) workloads. It has been enhanced over the past few years, with additions focused on vector dot product and matrix multiplication as part of Armv8, before the introduction of Arm Scalable Vector Extensions 2 (SVE2) and the new Arm Scalable Matrix Extension (SME) as key elements of Armv9 that drive higher compute performance and reduced power consumption for a range of generative AI workloads and use cases.

Seamless integration with AI accelerator technologies

Arm is the compute platform for the age of AI, driving ongoing architectural innovation that directly corresponds with the evolution of AI-based applications that are becoming faster, more interactive, and more immersive. The Arm CPU can be seamlessly augmented and integrated with AI accelerator technologies, such as GPUs and NPUs, as part of a flexible heterogeneous computing approach to AI workloads.

While the Arm CPU is the practical choice for processing many AI inference workloads, its flexibility means it is the perfect companion for accelerator technologies where more powerful and performant AI is needed to deliver certain use cases and computation demands. For our technology partners, this helps to deliver endless customization options to enable them to build complete silicon solutions for their AI workloads.

The post Why Arm is the Compute Platform for All AI Workloads appeared first on Arm Newsroom.

Arm
Revolutionizing High-Performance Silicon: Alphawave Semi and Arm Unite on Next-Gen Chiplets 9 October 2024 at 03:30

Revolutionizing High-Performance Silicon: Alphawave Semi and Arm Unite on Next-Gen Chiplets

Arm

By: anonymous

9 October 2024 at 03:30

By Shivi Arora, Director, ASIC IP Solutions and Sue Hung Fung, Principal Product Marketing and Management

As 5G wireless communications systems continue to be deployed, enterprises are busy planning for 6G —the next generation of wireless communications set to transform our lives. Poised to merge communication and computing, 6G promises to create a hyperconnected world that blends digital and physical experiences with ultra-fast speeds and low latency as a starting point.

Building on the foundations laid by 5G, 6G will continue supporting improved data latency, security, reliability, and the ability to process massive volumes of data in real-time. It will also challenge what’s possible by bringing new, groundbreaking capabilities to the forefront, including expanded ubiquitous connectivity, integrated sensing and communication, and advanced artificial intelligence.

The Need for Faster, Smarter Networks

In today’s technology-driven era, we rely on our handhelds, smartphones, and mobile devices to fulfill day-to-day tasks, most of which are driven by on-device or cloud-based AI and ML. Connectivity and compute power are the most important factors enabling on-cloud large language models (LLMs) to process and respond to human interaction. The communication infrastructure currently operates over 5G networks. It started not long ago with bandwidth in the kilobits per second (Kbps) range in 2G and has now evolved to gigabits per second (Gbps) in 5G. On the horizon, the existing 5G wireless communication infrastructure will soon evolve to 6G, offering bandwidths of terabits per second (Tbps). A much higher network bandwidth is needed with the increasing number of devices and complex AI workloads. Network infrastructure giants are already looking to update their hardware to support speeds 50-100 times faster than 5G, with air latency under 100 microseconds, and even wider network coverage and reliability.

With this new infrastructure for 6G, carrier support and hardware/software support will require new RF designs and chipsets capable of supporting higher communication frequencies, possibly up to 1 THz. Although newer networks may be designed for more data bits per kilowatt of power efficiency, the increase in density, traffic, and processing speeds tends to negate these savings.

The wireless technology trend of the existing 5G network is built around innovation in processors and wireless technology on mobile devices, and in wireless base stations and cells. Base stations are replaced by RUs (radio units), DUs (distributed units), and CUs (centralized units). The radio units manage antennas in real-time through multicore processor chips. The distributed and centralized units provide support for the lower and upper layers of the protocol stack, respectively. These protocol stacks operate on compute chiplets, which are mounted on hardware acceleration cards to handle protocol processing. Radio, distributed, and centralized units need to handle a lot of radio processing and traffic data. With even higher throughput and extremely complex workloads in the new 6G infrastructure, network architecture, software, and hardware accelerator card equipment will need an upgrade or redesign to process and handle much larger amounts of data. The processor compute chiplets on the accelerator cards manage up to dozens of antennas simultaneously and will need to grow in compute power as requirements become more complex with the move to 6G.

Alphawave Semi and Arm: Partnering for 6G Readiness

To fulfill the needs of this rapidly advancing semiconductor industry, Alphawave Semi is collaborating with Arm on a sophisticated chiplet that uses Arm’s Neoverse Compute Subsystems (CSS). These compute chiplets are vital for supporting the demanding requirements of 6G/5G infrastructure, cloud and edge compute applications and for handling enterprise networking, server, and AI/ML markets. This partnership integrates the silicon-proven IP portfolio from Alphawave with Arm’s Neoverse CSS N3 to handle intensive workload efficiency, performance optimization and power savings in both compute and accelerator chiplets.

The Arm Neoverse ecosystem of hardware and software is targeted for new generations of wireless mobile communications equipment and the wireless infrastructure’s cloud-based deployment. Software developers continue to port operating systems and tools to support Arm’s Neoverse class compute subsystems. These tools allow developers to scale their code using SVE (Scalable Vector Extension) to different vector lengths, reflecting new or updated hardware architecture, whereas traditional processors only handle vectors of specific widths. Traditional architectures require code to be rebuilt to handle additional vector bandwidth updates. SVE allows scalable vector performance on 5G RAN (Radio Access Network), a wireless communication architecture that uses 5G radio frequencies to provide wireless connectivity to devices. RANs perform complex processing when voice and data are converted to digital signals and transmitted as radio waves to RAN transceivers, to the core network, and onto the internet. The radio spectrum requirements for performance, capacity, speed, and latency will be redefined to support 6G, along with the connections of millions of devices per square kilometer. Base stations, antenna units, edge data servers, and cells will need to migrate to support an upgraded network architecture.

Arm’s Neoverse platform is delivering technology for building seamless networks as wireless technology continues to move to next-generation core designs. The Neoverse platform offers high performance per watt efficiency over traditional processors in clouds and 5G networks. The Arm CMN-700 integrates the Arm Neoverse CPU Cluster, high-speed L1/L2/L3 cache, DDR/LPDDR memory, and high-speed I/O, along with other management IP elements through an on-chip interconnect. It uses Memory Partitioning and Monitoring (MPAM) to share system-level resources like cache and DRAM memory bandwidth. CMN-700 supports CCIX, CXL, and CHI-C2C protocols for multi-die use cases and provides a low-latency path to DDR and CXL-attached memory pools.

The Future of Wireless: Chiplets and System-in-Package Solutions

By utilizing a high-performance process node, Alphawave Semi, working with Arm and its technologies, can offer compute chiplets that enable faster development, low risk and reduced time-to-market by packaging known good dice with customers’ accelerator chiplets. This collaboration is part of Arm’s Total Design initiative, which aims to create an ecosystem that speeds up the development of specialized silicon solutions based on Arm Neoverse CSS. This ecosystem-centric approach simplifies the design and development of complex computing solutions and addresses the increasingly sophisticated demands of modern digital infrastructures and wireless networks.

The compute chiplet leverages standard packaging as a modular chiplet and can optionally be implemented as a monolithic ASIC. The chiplet portfolio is complemented by a connectivity suite of standards-based, silicon-proven technologies, including PCIe, CXL, Universal Chiplet Express (UCIe), and memory subsystems. Alphawave Semi’s compute chiplet is built using Neoverse CSS N3, which is based on Arm’s Neoverse N3 CPU and the CMN S3 Coherent Mesh Network to create a compute system-in-package (SiP) that excels in scalability, modularity, and power efficiency.

Alphawave Semi’s unique chiplet-based design platform includes a variety of chiplets: an Arm-based Neoverse class compute chiplet, a multi-protocol I/O chiplet, and memory chiplets for various application spaces. The compute chiplet is the latest inclusion that expands the chiplet portfolio capabilities by adding more modularity and functionality for memory and I/O components to be attached to this chiplet. This compute chiplet addition provides yet another chiplet option to customers who wish to enhance the overall performance of their SiP by choosing from a variety of off-the-shelf chiplet products from Alphawave Semi’s portfolio. This broad suite of technologies allows for the flexible development of custom-tailored SiP solutions that meet specific customer requirements with different criteria for advanced packaging choices, IP connectivity, or data bandwidth performance.

Alphawave Semi’s partnership with Arm represents a significant stride in enabling greater performance in new innovative technologies, such as wireless network platforms. This long-term vision will provide greater performance for the complex compute requirements needed by advancements in modern 6G/5G network infrastructure as well as cloud and edge-based applications. This unique collaboration highlights Alphawave Semi’s role in leading the development of high-speed, energy-efficient computing platforms.

The post Revolutionizing High-Performance Silicon: Alphawave Semi and Arm Unite on Next-Gen Chiplets appeared first on Arm Newsroom.

Arm
Samsung Galaxy Tab S10 Ultra Brings Arm Immortalis GPU to Tablets for the Very First Time 4 October 2024 at 16:00

Samsung Galaxy Tab S10 Ultra Brings Arm Immortalis GPU to Tablets for the Very First Time

Arm

By: Alon Or-bach

4 October 2024 at 16:00

In the ever-evolving world of technology, Samsung has made a significant leap forward with its new high-end tablets, the Galaxy Tab S10 Ultra and S10+. Built on the Arm-based MediaTek Dimensity 9300+ SoC, it is the first time that tablet devices feature Arm’s Immortalis GPU.

Alongside the cutting-edge Arm Immortalis-G720 GPU, Dimensity 9300+ also adopts the high-performance Arm CPU cluster featuring the Cortex-X4. This combination delivers an unparalleled user experience, making the devices game-changers in the tablet market.

Immersive graphics and visual experiences with Arm Immortalis

Already featured in the flagship smartphones, like vivo’s X100 and X100 Pro, Immortalis-G720 beats the competition in most graphics benchmarks for peak performance (frames per second (fps)) and sustained performance for longer game play. As a result, there is a performance upgrade from the previous generation Arm Immortalis-G715 GPU without sacrificing battery life. This provides exceptional GPU performance, especially for high-demand gaming and video applications.

Furthermore, thanks to its 12-core 5^th Gen GPU architecture with hardware ray tracing support, Immortalis-G720 delivers stunning visuals for high-quality, immersive visual experiences. The GPU also helps to support seamless multi-tasking on mobile devices, like gaming and video streaming at the same time.

Game developer boost

The adoption of Immortalis-G720 by the Samsung Galaxy Tab S10 Ultra and S10+ tablets is likely to provide a boost for game developers, opening up more high-end gaming applications to new devices. Arm’s GPUs provide the largest target base for their applications with over 9 billion shipped to date. This is 1 billion more than last year and more than 1 GPU for every person on Earth. Alongside the GPUs, Arm’s industry-leading graphics features, optimizations and development tools help developers create the very best application experiences.

Recently, we introduced Arm Accuracy Super Resolution (Arm ASR), which is our best-in-class open-source solution for upscaling visual and gaming content on mobile devices. With Arm ASR on, there is a significant uplift in frames-per-second (FPS) through improved performance on the GPU. This translates into power savings on mobile devices and longer gameplay. We have demonstrated the performance benefits of Arm ASR on Immortalis-G720 via a Dimensity 9300-based mobile device, as highlighted in this blog.

Game developers can check out on developer.arm.com for resources on Immortalis-G720, or those who are new to Arm GPUs can view this comprehensive video series to get started.

Enhanced user experience with Cortex-X4 CPU

The inclusion of Immortalis-G720 is complemented by the combination with the big-core Arm CPU cluster in the Dimensity 9300+. This enables mobile devices to deliver PC and console-level illumination effects at 60 frames per second (FPS) for smoother gaming experiences on mobile.

As part of this cluster is the Arm Cortex-X4 CPU, which provides a significant boost to compute performance. Cortex-X4 offers up to 14 percent more single-thread performance compared to its predecessor, the Cortex-X3, and is designed to handle both performance benchmarks and real-world workloads with ease.

Additionally, the Cortex-X4 is up to 40 percent more power-efficient compared to its predecessor, which means users can enjoy sustained performance without compromising battery life. This not only makes the Samsung Galaxy Tab S10 Ultra and S10+ tablets ideal for long gaming sessions, but also leads to performance improvements in productivity tasks, faster web browsing and quicker app loading.

Tablets for every need

The adoption of Immortalis-G720 promises to be a game-changer for the tablet experience. Whether you’re a gamer, professional, or a student, the Samsung Galaxy Tab S10 Ultra and S10+ tablets are designed to cater to a wide range of needs.

While the focus will be on the enhanced gaming and visual experience thanks to Immortalis-G720, the powerful hardware, which also includes Arm’s high performance big core CPU cluster, make it ideal for productivity, entertainment, web browsing and everything in between. Through the Arm and MediaTek partnership, we are helping Samsung deliver a step-change in tablet computing.

The post Samsung Galaxy Tab S10 Ultra Brings Arm Immortalis GPU to Tablets for the Very First Time appeared first on Arm Newsroom.

Arm
Arm-native Google Chrome Enhances Windows on Arm Performance  2 October 2024 at 22:00

Arm-native Google Chrome Enhances Windows on Arm Performance 

Arm

By: Dawid Borycki

2 October 2024 at 22:00

Microsoft Windows 10 and Windows 11 incorporate Arm-native support, warranting the development of even more Arm-native apps for Windows. This support features additional tools to simplify app porting, enhance app performance, and reduce power consumption. As a result, many companies are now investing in Arm-native apps for Windows.  

Previously Arm talked about the excellent momentum behind the ecosystem of Windows on Arm applications, with Google Chrome being one of the prime examples. However, we wanted to put this to the test by exploring the range of improvements that native Arm support is delivering for Google Chrome.

AArch64 Support for Google Chrome 

The recent release of Google Chrome to add native AArch64 support for Windows provides a range of benefits for users, including:

Enhanced performance: Arm-native support made web browsing on Google Chrome faster and more efficient with a significant performance boost compared to its emulated x86 version.

Quicker web page loading: Websites with slow loading times now load much quicker as Arm-native. This improvement is due to optimized scripting, system tasks, and rendering processes.

Improved JavaScript execution: JavaScript execution becomes notably faster when Arm-native, enhancing the responsiveness of web applications and interactive elements.

Better battery life: The efficient power consumption of the Arm-native code allows users to engage more with their devices without needing to recharge regularly.

Superior rendering speeds: Rendering times are drastically reduced, making web pages appear faster and smoother.

Performance comparison: Emulated x86 vs. Arm-native

To illustrate these benefits, we installed the Google Chrome release for x86_64 Windows, designated “Win64” (Version 125.0.6422.61 (Official Build)) and which runs emulated on Windows on Arm, as well as the native Chrome release for AArch64 Windows, designated “Arm64,” to analyze the performance of a popular news website.

Using the “Performance” tab in Google Chrome’s Developer Tools, we quantified load and rendering speeds.  

Emulated x86 Version: The website took nearly 16 seconds to load, with significant time spent on scripting (4.4 seconds), system tasks (1.7 seconds), and rendering (0.9 seconds). 

Arm-native Version: Scripting time was reduced to 1.5 seconds (almost 3x shorter), system time to 0.4 seconds (4.25x shorter) and rendering to 0.18 seconds (5x shorter), indicating markedly faster loading and rendering due to native Arm execution.  

Performance tests on other news websites showed similar results. 

Speedometer 3.0 Benchmark

To further highlight the performance benefits of the Arm-native version of Google Chrome, we utilized the Speedometer 3.0 web browser benchmark. This is an open-source benchmark that measures the responsiveness of web applications by timing simulated user interactions across various workloads.

The benchmark tasks are designed to reflect practical web use cases, though some specifics pertain to Speedometer and should not be used as generic app development practices. This benchmark is created by the teams behind the major browser engines—Blink, Gecko, and WebKit—and has received significant input from companies including Google, Intel, Microsoft, and Mozilla.

Speedometer 3.0 benchmark results for Google Chrome with emulated x86 version

Speedometer 3.0 benchmark results for Google Chrome with Arm-native version

Upon running the Speedometer 3.0 benchmark on the emulated x86 and Arm-native versions of Google Chrome (tested on Windows Dev Kit 2023), Arm-native support was found to significantly enhance the responsiveness of the web applications. This advantage is demonstrated above showing an Arm-native performance score that is more than three times better than on emulated x86. This further underscores the superior efficiency and performance of native Arm applications on Windows on Arm.  

Running Inference with TensorFlow.js and MobileNet

TensorFlow.js is a JavaScript implementation of Google’s widely acclaimed TensorFlow library. It allows developers to use AI and machine learning (ML) when building interactive and dynamic browser-based applications. With TensorFlow.js, users can train and deploy AI models directly in the client-side environment, facilitating real-time data processing and analysis without the need for extensive server-side computation.

MobileNet is a class of efficient architectures designed specifically for mobile and embedded vision applications. It stands out due to its lightweight structure, enabling fast and efficient performance on devices with limited computational power and memory resources.

In Python applications that use TensorFlow, using MobileNet is straightforward:

Python:

model = MobileNet(weights='imagenet')

Next, you perform predictions on your input image:

predictions = model.predict(input_image)

Please refer to the tutorial for a better example of training and inference.

These predictions can then be converted to actual labels:

print('Predicted:', decode_predictions(predictions, top=3)[0])

Where decode_predictions is a hypothetical function that converts the model scores (probabilities) into labels describing the image content.

TensorFlow.js provides a similar interface:

model_tfjs = await tf.loadGraphModel(MOBILENET_MODEL_PATH);

Predictions can run after eventual image pre-processing:

predictions = model_tfjs.predict(image);

Afterward, you convert them to labels or classes:

labels = await getTopKClasses(predictions, 3);

For a better sample web application, please refer to this example.

We ran the above web application in the emulated x86 Chrome web browser, as well as in the Arm-native version. Arm-native Chrome for Windows is available for anyone to download here.

The image below demonstrates the web app running in the Chrome web browser. The user interface of this application contains three core elements: a description section; a status indicator; and a model output display. The description section explains how the application was created. Upon uploading an image, the application springs into action, with the status component updating in real time to show the computation time. Once the image processing concludes, the model output takes center stage, revealing the recognized labels along with their corresponding scores.

*The AArch64 version of the Google Chrome Web Browser demonstrates improved performance in the TensorFlow.js application.*

The total processing time, including image preprocessing and AI inference, was almost 100 ms on the emulated x86 Chrome. The same operations took only 35 ms (approximately 33 percent as long) on the Arm-native version of Google Chrome. The inference results (recognized labels and scores) were identical, as the same image was used as input.

Delivering real performance for real needs

The integration of native Arm support in Google Chrome for Windows has led to significant performance enhancements, making web browsing faster, more efficient, and more responsive. These improvements are evident in both general web browsing and specific applications like TensorFlow.js with MobileNet, highlighting the growing importance of Arm-native support in the broader computing landscape. As more companies invest in Arm-native applications for Windows, users can anticipate continued advancements in efficiency and performance across a wide range of devices and applications.

To help you get started with migrating your application to Arm, we offer a wealth of educational resources at Arm’s Developer Hub.

At Arm, we are dedicated to driving innovation and delivering cutting-edge technology that empowers developers and enhances user experiences. The success of Arm-native support in Google Chrome exemplifies the transformative potential of Arm architecture in shaping the future of computing.

Check out Arm Learn for a series of step-by-step guides on Windows on Arm Development.

The post Arm-native Google Chrome Enhances Windows on Arm Performance  appeared first on Arm Newsroom.

Arm
Beyond the Newsroom: 10 Latest Innovations from Arm in September 2024 1 October 2024 at 21:00

Beyond the Newsroom: 10 Latest Innovations from Arm in September 2024

Arm

By: Arm Editorial Team

1 October 2024 at 21:00

September 2024 has been another innovative month for us, showcasing various Arm innovations across various domains. From enhancing CPU performance and efficiency to revolutionizing AI software development and optimizing automotive microcontrollers, we continue to push the boundaries of technology and ensure the future of computing is built on Arm.

Here’s a summary of the top 10 technological developments at Arm this month:

Setting new performance and efficiency standards with Armv9 CPUs and SVE2

Each new generation of Arm CPUs gets faster and better, meeting the needs of modern computing tasks. Poulomi Dasgupta, Senior Manager of Consumer Computing, highlights how Armv9 CPUs and their exclusive SVE2 optimizations are part of the latest Arm technological advancements. They help enhance performance and efficiency for mobile devices and boosts HDR video decoding by 10% and image processing by 20%. This helps improve battery life and app performance for popular apps like YouTube and Netflix.

Likewise, Yibo Cai, Principal Software Engineer, explains how the SVMATCH instruction introduced with SVE2 speeds up multi-token searches, simplifying tasks like parsing CSV files. This further helps reduce the number of operations needed, leading to better performance, as seen in the optimized Sonic JSON decoder. A highlight of SVMATCH is that it helps enhance various software engineering tasks, making data processing faster and more efficient.

How Arm and Meta are transforming AI software development

Sy Choudhury, Director, AI Partnerships at Meta, explains how Arm and Meta are accelerating AI software development through open innovation and optimizing large language models (LLMs), like Llama, across data centers, smartphones, and IoT devices.

Learn more about the latest Kleidi integrations for PyTorch, ExecuTorch, and more, as well as how to unlock the true performance potential of LLMs with Arm’s cutting-edge innovations – all while simplifying model customization and deployment.

Faster PyTorch Inference using Kleidi on Arm Neoverse

PyTorch is a popular open-source library for machine learning. Ashok Bhat, Senior Product Manager, explains how Arm has improved PyTorch’s inference performance using Kleidi technology, integrated into the Arm Compute Library and KleidiAI library. This includes optimized kernels for machine learning (ML) tasks on Arm Neoverse CPUs.

These optimizations lead to significant performance improvements. For example, using torch.compile can achieve up to 2x better performance compared to Eager mode for various models. Additionally, new INT4 and INT8 kernels can enhance inference performance by up to 18x for specific models like Llama and Gemma. These advancements make PyTorch more efficient on Arm hardware, potentially reducing costs and energy consumption for machine learning tasks.

Advancing ASR Technology with Kleidi on Arm Neoverse N2

Automatic Speech Recognition (ASR) technology is widely used in applications like voice assistants, transcription services, call center analytics, and speech-to-text translation. Willen Yang, Senior Product Manager, and Fred Jin, Senior Software Engineer, introduce FunASR, an advanced toolkit developed by Alibaba DAMO Academy.

FunASR supports both CPU and GPU, with a focus on efficient performance on Arm Neoverse N2 CPUs. It excels in accurately understanding various accents and speaking styles. Using bfloat16 fastmath kernels on Arm Neoverse N2 CPUs, FunASR achieves up to 2.4 times better performance compared to other platforms, making it a cost-effective solution for real-world deployments.

Optimizing LLMs with Arm’s Kleidi Innovation

As Generative AI (GenAI) transforms business productivity, enterprises are integrating Large Language Models (LLMs) into their applications on both cloud and edge. Nobel Chowdary Mandepudi, Graduate Solutions Engineer, discusses how Arm’s Kleidi technology enhances PyTorch for running LLMs on Arm-based processors. This integration simplifies access to Kleidi technology within PyTorch, boosting performance.

The demo application demonstrates significant improvements, such as faster token generation and reduced costs. For example, the time to generate the first token is less than 1 second, and the decode rate is 33 tokens per second, meeting industry standards for interactive chatbots.

These optimizations make running LLMs on CPUs practical and effective for real-time applications like chatbots, leading to more efficient and cost-effective AI solutions. This benefits businesses and developers by reducing latency and operational costs.

Demonstrating AI performance uplifts with KleidiAI

Arm’s KleidiAI, integrated with ExecuTorch, enhances AI inference on edge devices. In a demo by Gian Marco Iodice, real-time inference of the Llama 3.1 8B parameter model is showcased on a mobile phone. This demo highlights KleidiAI’s capability to accelerate various AI models across billions of Arm-based devices globally.

KleidiAI uses optimized micro-kernels and advanced Arm CPU instructions like SMMLA and FMLA to deliver efficient AI performance without compromising speed or accuracy.

Meanwhile, Nobel Chowdary Mandepudi demonstrates how KleidiAI boosts AI performance in the cloud using AWS Graviton 4 instances. This demo highlights KleidiAI’s ability to drive AI performance on Arm-powered cloud servers while maintaining energy efficiency.

This side-by-side demo compares PyTorch inference with and without KleidiAI optimizations, showcasing significant improvements in efficiency and speed with the Llama 3.1 8B model. KleidiAI leverages advanced Arm instructions to accelerate generative AI workloads, enhancing text generation and prompt evaluation.

Arm’s growing ecosystem and server integration

The Arm ecosystem is expanding rapidly across all sectors, including Microsoft Copilot+ PCs, cloud services (AWS, Google, Microsoft), and automotive innovations like in-vehicle infotainment (IVI) and advanced driver assistance systems (ADAS). Steve Demski, Director of Product Marketing, highlights the integration of several hundred HPE ProLiant RL300 Gen11 servers, powered by Ampere Altra Max CPUs, into Arm’s Austin datacenter.

*Comparison of core density per rack using Arm or x86-based CPUs*

These high-performance, power-efficient servers support various workloads and align with Arm’s goal to run at least 50% of their on-premises EDA cluster infrastructure on Arm by 2024. This transition boosts productivity frees up space and power for future workloads like generative AI, and offers a lower cost-per-core, enabling more efficient budget allocation.

Simplifying automotive microcontrollers with EB tresos Embedded Hypervisor

The EB tresos Embedded Hypervisor by Elektrobit allows multiple virtual machines (VMs) to run on a single automotive microcontroller, supporting various operating systems and applications. Dr. Bruno Kleinert from Elektrobit explains how this technology optimizes resource use, enhances safety, and cuts costs.

The hypervisor technology is crucial for software-defined vehicles (SDVs), enabling flexible updates to vehicle functions. It also supports sustainable design by reducing hardware, cabling, weight, and energy use, with the technology touted to be ready for mass production in October 2024, and safety-approved versions expected in early 2025.

Boosting performance for AI and beyond with OpenRNG

OpenRNG is an open-source Random Number Generator (RNG) library that boosts performance for AI, scientific, and financial applications. Kevin Mooney, Staff Software Engineer, explains how it can replace Intel’s Vector Statistics Library (VSL) and supports various random number generators, including pseudorandom, quasirandom, and true random generators.

Bar chart showing the performance benefit of using OpenRNG instead of the C++ standard library. The height of each bar is the ratio of time spent generating random numbers with both libraries. Greater than 1 means OpenRNG was faster.

OpenRNG significantly improves performance, enhancing PyTorch’s dropout layer by up to 44 times and speeding up the C++ standard library by 2.7 times. OpenRNG is crucial for applications needing fast and reliable random number generation, like AI, gaming, and financial modeling, ensuring consistent results across systems.

Pioneering automotive safety with Arm Software Test Libraries

Integrating Arm’s Software Test Libraries (STLs) into automotive systems boosts safety and reliability, meeting ISO26262 standards. Andrew Coombes, Principal Automotive Software Product Manager, and ETAS explains how Arm STL can be used with Classic AUTOSAR to improve diagnostics, detect faults early, and offer flexible integration. Using Arm STLs with microcontroller hypervisors based on Arm architecture supports mixed-criticality systems and enhances fault mitigation. Meanwhile, achieving Functional Safety certification requires comprehensive strategies, as detailed in a joint white paper by Exida and Arm.

Optimizing video calls with artificial intelligence

While video conferencing is a ubiquitous tool for communication, it is not always a straightforward plug-and-play experience, as adjustments may be needed to ensure a good audio and video setup. Ayaan Masood, a Graduate Engineer, has developed a demo mobile app that uses a neural network model to improve video lighting in low-light conditions.

This app processes video frames in real time, providing smooth and clear visuals. It ensures a professional appearance during video calls, which is essential for remote work and social interactions. The success of this app highlights the potential of AI to solve everyday problems, paving the way for more AI-driven solutions in various fields.

The post Beyond the Newsroom: 10 Latest Innovations from Arm in September 2024 appeared first on Arm Newsroom.