Breaking
January 30, 2025

‘A virtual DPU within a GPU’: Could clever hardware hack be behind DeepSeek’s groundbreaking AI efficiency? waynewilliams@onmail.com (Wayne Williams) | usagoldmines.com


  • A new approach called DualPipe seems to be the key to DeekSeek’s success
  • One expert describes it as an on-GPU virtual DPU that maximizes bandwidth efficiency
  • While DeepSeek has used Nvidia GPUs only, one wonders how AMD’s Instinct would fare

China’s DeepSeek AI chatbot has stunned the tech industry, representing a credible alternative to OpenAI’s ChatGPT at a fraction of the cost.

A recent paper revealed DeepSeek V3 was trained on a cluster of 2,048 Nvidia H800 GPUs – crippled versions of the H100 (we can only imagine how much more powerful it would be running on AMD Instinct accelerators!). It reportedly required 2.79 million GPU-hours for pretraining, fine-tuning on 14.8 trillion tokens, and cost – according to calculations made by The Next Platform – a mere $5.58 million.

But exactly how DeepSeek’s developers managed this feat is likely down to a clever hack.

A virtual DPU on the GPU itself

First, some background. DeepSeek is an advanced Mixture-of-Experts (MoE) language model designed to optimize performance by selectively activating only the most relevant parts of its architecture for each task. The third version of the model, DeepSeek-V3, features a total of 671 billion parameters, with only 37 billion activated for any given token prediction. This selective activation massively reduces computational costs while maintaining high performance and accuracy – which you’ll see if you try it.

It’s easy to be skeptical of DeepSeek and the claims made regarding its training, but the paper reveals some of the magic the developers came up with to make the most of the crippled hardware they had to work with. This includes the creation of the DualPipe algorithm for efficient pipeline parallelism.

According to the information published by DeepSeek, DualPipe overlaps forward and backward computation, reduces latency, and optimizes data movement across GPUs. By efficiently managing communication, it minimizes idle time (pipeline bubbles) and dynamically balances GPU compute cores (Streaming Multiprocessors) between computation and communication, preventing data transfer bottlenecks as the model scales.

A commenter on The Next Platform describes DualPipe as “essentially creating a virtual DPU on the GPU itself to handle all-to-all communication,” which highlights its role in optimizing data transfer efficiency.

The paper goes into further detail, “In order to ensure sufficient computational performance for DualPipe, we customize efficient cross-node all-to-all communication kernels (including dispatching and combining) to conserve the number of SMs dedicated to communication. The implementation of the kernels is co-designed with the MoE gating algorithm and the network topology of our cluster. To be specific, in our cluster, cross-node GPUs are fully interconnected with IB, and intra-node communications are handled via NVLink.”

Example DualPipe scheduling

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication. (Image credit: DeekSeek)

​ 

This articles is written by : Nermeen Nabil Khear Abdelmalak

All rights reserved to : USAGOLDMIES . www.usagoldmines.com

You can Enjoy surfing our website categories and read more content in many fields you may like .

Why USAGoldMines ?

USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.

Recent:

Lenovo Legion 5i review: This speed demon is a bargain | usagoldmines.com

This Ryzen 7 mini PC with 32GB RAM hits its lowest price ever: $499 | usagoldmines.com

Best laptops 2025: Premium, budget, gaming, 2-in-1s, and more | usagoldmines.com

6 surprisingly helpful uses for the USB port on your router | usagoldmines.com

Is your VPN app really secure? Check for this new ‘verified’ symbol | usagoldmines.com

ATSC 3.0: The future of broadcast TV spent another year stuck in neutral | usagoldmines.com

Netflix now lets you download entire seasons with a single click | usagoldmines.com

That teeny-tiny Asus Zenbook A14 laptop from CES is now for sale | usagoldmines.com

Best VPN services 2025: Top picks for speed, price, privacy, and more | usagoldmines.com

ChatGPT update brings more knowledge and better image recognition | usagoldmines.com

Asus says don’t worry about GPUs scratched by Q-Release PCIe slots | usagoldmines.com

New Flappy Golf Title Soon Coming to Android and iOS Tim | usagoldmines.com

Microsoft now hosts AI model accused of copying OpenAI data Benj Edwards | usagoldmines.com

ATSC 3.0: The future of broadcast TV spent another year stuck in neutral | usagoldmines.com

Nothing Says the Nothing Phone 3a is Coming March 4 Kellen | usagoldmines.com

'Liked Songs Manager' Automatically Turns Your Spotify Likes Into Playlists Justin Pot | usagoldmine...

Comcast Just Gave Six Cities an Early Look at Lag-Free Internet Michelle Ehrhardt | usagoldmines.com

Watch out, your office phone could be hijacked into a Mirai botnet | usagoldmines.com

The Future Games Show returns in March for its spring showcase and will include live broadcast from ...

Criminals are abusing top-level government domains across multiple countries | usagoldmines.com

Microsoft says its revenue dropped by 7% in its Q2 2025 earnings while Xbox hardware sales dropped b...

Civ 7 requirements for PC, Steam Deck, Linux, and Mac | usagoldmines.com

DeepSeek just insisted it's ChatGPT, and I think that's all the proof I need lance.ulanoff@futurenet...

The fate of Nvidia’s GeForce RTX 50-series lies in DLSS 4’s hands | usagoldmines.com

This tiny 2K security camera is super cheap at just $25 right now | usagoldmines.com

Microsoft updates new Surface Pro, Laptop with Intel inside | usagoldmines.com

Nvidia’s GeForce RTX 5090 and 5080 sell out almost instantly | usagoldmines.com

NordVPN’s new protocol is designed to evade VPN restrictions | usagoldmines.com

Windows 11’s Auto HDR works again, but you have to manually update first | usagoldmines.com

This Video Doorbell Is $80 Right Now, and It Doesn't Need a Monthly Subscription Pradershika Sharma ...

Samsung Introduces Major Discounts on TVs, Monitors, and More Ahead of Super Bowl LIX Mitchel Brouss...

Microsoft’s new Surface for Business PCs have AI firmly at the core | usagoldmines.com

Why businesses must avoid ‘AI FOMO’ at all costs | usagoldmines.com

Netflix just released an ominous first teaser clip of You season 5, but I'm still recovering from se...

Stranger Things season 5's 12-month shoot yielded 650-plus hours of footage for its eight 'blockbust...

AI safety at a crossroads: why US leadership hinges on stronger industry guidelines | usagoldmines....

Bennu asteroid samples yield watery history, key molecules for life Timothy J McCoy and Sara Russell...

Microsoft updates Intel-based Surface PCs, but regular people still can’t buy them Andrew Cunningham...

If you hate passwords, switch to this other kind of login right now | usagoldmines.com

Today’s best laptop deals: Save big on work, school, home use, and gaming | usagoldmines.com

Your Phone Makes a Great Reading Device, Actually Justin Pot | usagoldmines.com

It's About to Get Much Easier to Cancel Your Subscriptions Meredith Dietz | usagoldmines.com

Apple Continues to Be the World's Most Admired Company Hartley Charlton | usagoldmines.com

AI agents are proving remarkably popular - but firms still face many challenges | usagoldmines.com

New DeepSeek AI rival claims to be more powerful than both V3 and ChatGPT-4o – meet Qwen2.5-Max | u...

Netflix reveals June 2025 release date for Squid Game season 3, and its first clip teases a new mini...

RX 9070 GPU could theoretically be an RTX 5070 killer, I’m just worried that AMD may not go for Nvid...

Nvidia’s RTX 50-series could be a huge flop if gamers reject DLSS 4 | usagoldmines.com

Unlock hands-free Kindle reading with this $16 page-turner add-on | usagoldmines.com

Mark Zuckerberg just teased next-gen Ray-Ban smart glasses – here are 4 things I want to see hamish....

NYT Connections today — my hints and answers for Friday, January 31 (game #600) | usagoldmines.com

I was excited by Netflix’s Black Doves renewal, but Ben Whishaw’s disappointing season 2 update mean...

NYT Strands today — my hints, answers and spangram for Friday, January 31 (game #334) | usagoldmine...

Quordle today – my hints and answers for Friday, January 31 (game #1103) | usagoldmines.com

Marvel Rivals crosshairs: how to change and import them | usagoldmines.com

Where to buy Nvidia RTX 5090: launch day is today, and these are the retailers I'd check christian.g...

Tesla’s 2024 financial results are out—and they’re terrible Jonathan M. Gitlin | usagoldmines.com

Nvidia’s RTX 50-series could be a huge flop if gamers reject DLSS 4 | usagoldmines.com

50 iPhone Features Apple Added to iOS 18 Since September Tim Hardwick | usagoldmines.com

The Samsung Galaxy S25's best software feature just got Google Maps support | usagoldmines.com

The Google Pixel 9a will reportedly ship in less than two months, and we also have an idea of when i...

Microsoft reveals massive financial growth once again, cloud and AI lead the way | usagoldmines.com

PSN accounts will now be optional for some PlayStation games on PC, but there will be 'incentives' f...

Nvidia RTX 5090 GPU stock rumored to be ‘basically non-existent’ for launch day, and RTX 5080 doesn’...

'I'm here as long as they want me': Your Friendly Neighborhood Spider-Man creator 'has big ideas' fo...

Ecommerce firms are pushing for major technology investments | usagoldmines.com

DeepSeek security breach - critical databases exposed, more than one million records reportedly leak...

The Nothing Phone 3a series is confirmed for March 4, and a photo and specs rumor gives us an idea o...

Prime Video will get a big movie boost with new Lionsgate movies, and as a thriller fan it makes the...

Fragmented cybersecurity is costing businesses billions, and putting them at risk | usagoldmines.co...

Don’t play this new Block Breaker game in Google Search – I’m already hopelessly addicted to the nos...

Sony to cut down on monthly PS4 PS Plus games from January 2026, citing number of PS5 players dash.w...

An unprotected AI service is streaming private Slack messages online | usagoldmines.com

Where to buy Nvidia RTX 5080: stock goes on sale today but could sell out fast matthew.hanson@future...

Grand Theft Auto 6 will reportedly run at 30FPS on PS5 and Xbox Series X | usagoldmines.com

These luxe wireless headphones stick a tube-amp hi-fi on your head, and offer really hardcore Hi-Res...

The Samsung Galaxy S25's best software feature just got Google Maps support | usagoldmines.com

TSMC Founder Reveals Why Apple Chose Them Over Intel as Custom Chip Supplier Tim Hardwick | usagoldm...

Cooling high-density data centers with coolant distribution units | usagoldmines.com

Philips Hue set to support the latest LG TVs, so you can turn your movies or games into a light show...

Windows 11 Now Lets You Access Your iPhone from the Start Menu Tim Hardwick | usagoldmines.com

It’s time to catch up with cyber attackers | usagoldmines.com

Don’t just present—captivate with Microsoft Visio 2021 for just $20 | usagoldmines.com

How MappyField’s Custom Scheduled Board Simplifies Resource and Work Order Management? Amit Shah | u...

Best SSDs of 2025: Reviews and buying advice | usagoldmines.com

There are two new ways to stream Apple's MLS Season Pass this year, plus more content to take in and...

Nvidia GeForce RTX 5080 review: Betting the future on ‘fake frames’ | usagoldmines.com

Windows Is Expanding Its Start Menu Phone Integration to iPhones Michelle Ehrhardt | usagoldmines.co...

iPhone vs Galaxy video: which would you choose? | usagoldmines.com

Gemini AI can see and talk to you about what's on your screen – which could be more helpful than it ...

NYT Connections today — my hints and answers for Thursday, January 30 (game #599) | usagoldmines.co...

NYT Strands today — my hints, answers and spangram for Thursday, January 30 (game #333) | usagoldmi...

Quordle today – my hints and answers for Thursday, January 30 (game #1102) | usagoldmines.com

Best PC computer deals: Top picks from desktops to all-in-ones | usagoldmines.com

T-Mobile Is Expanding Its Starlink Beta to iPhones Jake Peterson | usagoldmines.com

How I Learned That I Don't Need a Carving Knife (and What I Use Instead) Allie Chanthorn Reinmann | ...

When Will Apple Release the iOS 18.4 Beta? Juli Clover | usagoldmines.com

Democrat teams up with movie industry to propose website-blocking law Jon Brodkin | usagoldmines.com

Weight saving and aero optimization feature in the 2025 Porsche 911 GT3 Jonathan M. Gitlin | usagold...

Netflix Will Finally Let iPhone and iPad Users Download an Entire Season With Just One Tap Michelle ...

Leave a Reply