Adol Blog

  • News
  • Software
Sharer of Internet News & Tools
  1. Main page
  2. News
  3. Main content

Microsoft Azure ND GB300 Sets New Record: 1.1 Million Tokens per Second Inference Speed

2025-11-04 12hotness 0likes 0comments
Microsoft recently announced that its Azure ND GB300v6 virtual machine has achieved an industry-leading new record of 1.1 million tokens per second in inference speed on Meta's Llama 2 70B model. Satya Nadella, CEO of Microsoft, stated on social media: "This achievement is the result of our long-standing partnership with NVIDIA and our expertise in running AI at production scale."
The Azure ND GB300 virtual machine is powered by NVIDIA's Blackwell Ultra GPUs, specifically the NVIDIA GB300 NVL72 system. It features a single-machine architecture with 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs. Optimized specifically for inference workloads, this virtual machine delivers a 50% increase in GPU memory and a 16% boost in Thermal Design Power (TDP).
To validate the performance improvement, Microsoft ran the Llama 2 70B model (FP4 precision) on 18 ND GB300v6 virtual machines within a single NVIDIA GB300 NVL72 domain, using NVIDIA TensorRT-LLM as the inference engine. Microsoft stated: "An Azure ND GB300v6 cluster with one NVL72 rack achieved a total inference speed of 1.1 million tokens per second." This new record surpasses Microsoft's previous achievement of 865,000 tokens per second on the NVIDIA GB200 NVL72 rack.
Based on the system configuration, each GPU delivers approximately 15,200 tokens per second. Microsoft has also provided detailed simulation processes, along with all log files and results. The performance record has been verified by Signal65, an independent performance validation and benchmarking company.
Russ Fellows, Vice President of Labs at Signal65, noted in a blog post: "This milestone not only breaks the million-tokens-per-second barrier but does so on a platform that meets the dynamic usage and data governance needs of modern enterprises." He added that the Azure ND GB300 offers a 27% improvement in inference performance compared to the previous-generation NVIDIA GB200, while only increasing power specifications by 17%. Compared to the NVIDIA H100 generation, the GB300 delivers nearly 10x higher inference performance and almost 2.5x better power efficiency at the rack level.
Tag: Nothing
Last updated:2025-11-04

Adol

AMD yes!

Like
Next article >

Comments

razz evil exclaim smile redface biggrin eek confused idea lol mad twisted rolleyes wink cool arrow neutral cry mrgreen drooling persevering
Cancel
Newest Hotest Random
Newest Hotest Random
Lambda Signs Multi-Billion Dollar AI Infrastructure Partnership with Microsoft Arc's Spirit Reborn! After $610 Million Acquisition, Dia Browser Confirms It Will Inherit Its Predecessor's Legacy, Merging AI Architecture with Classic Design Coca-Cola's AI-Generated "Holidays Are Coming" Ad Draws Criticism, But Company Insists It's "Faster and Cheaper" Apple iOS 27 Development Progress: Major AI Feature Upgrades on the Horizon AI-Powered PPT Generator: Gemini Canvas's New Feature Launches, Freeing Professionals Instantly Hyundai and NVIDIA Partner to Build $3 Billion AI Factory
Aeroshell Intelligent Terminal - Your Reliable Operation and Maintenance AssistantMicrosoft Azure ND GB300 Sets New Record: 1.1 Million Tokens per Second Inference SpeedHyundai and NVIDIA Partner to Build $3 Billion AI FactoryAI-Powered PPT Generator: Gemini Canvas's New Feature Launches, Freeing Professionals InstantlyApple iOS 27 Development Progress: Major AI Feature Upgrades on the HorizonCoca-Cola's AI-Generated "Holidays Are Coming" Ad Draws Criticism, But Company Insists It's "Faster and Cheaper"
Completely Free PDF Toolkit: PDF24 Screenshot Tool: Snipaste Software Uninstallation Tool: GeekUninstaller Eye-Care Tool: f.lux Best Full - Text Search: AnyTXT Searcher Arc's Spirit Reborn! After $610 Million Acquisition, Dia Browser Confirms It Will Inherit Its Predecessor's Legacy, Merging AI Architecture with Classic Design

COPYRIGHT © 2025 adolnb. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang