Breaking
January 30, 2025

‘A virtual DPU within a GPU’: Could clever hardware hack be behind DeepSeek’s groundbreaking AI efficiency? waynewilliams@onmail.com (Wayne Williams) | usagoldmines.com


  • A new approach called DualPipe seems to be the key to DeekSeek’s success
  • One expert describes it as an on-GPU virtual DPU that maximizes bandwidth efficiency
  • While DeepSeek has used Nvidia GPUs only, one wonders how AMD’s Instinct would fare

China’s DeepSeek AI chatbot has stunned the tech industry, representing a credible alternative to OpenAI’s ChatGPT at a fraction of the cost.

A recent paper revealed DeepSeek V3 was trained on a cluster of 2,048 Nvidia H800 GPUs – crippled versions of the H100 (we can only imagine how much more powerful it would be running on AMD Instinct accelerators!). It reportedly required 2.79 million GPU-hours for pretraining, fine-tuning on 14.8 trillion tokens, and cost – according to calculations made by The Next Platform – a mere $5.58 million.

But exactly how DeepSeek’s developers managed this feat is likely down to a clever hack.

A virtual DPU on the GPU itself

First, some background. DeepSeek is an advanced Mixture-of-Experts (MoE) language model designed to optimize performance by selectively activating only the most relevant parts of its architecture for each task. The third version of the model, DeepSeek-V3, features a total of 671 billion parameters, with only 37 billion activated for any given token prediction. This selective activation massively reduces computational costs while maintaining high performance and accuracy – which you’ll see if you try it.

It’s easy to be skeptical of DeepSeek and the claims made regarding its training, but the paper reveals some of the magic the developers came up with to make the most of the crippled hardware they had to work with. This includes the creation of the DualPipe algorithm for efficient pipeline parallelism.

According to the information published by DeepSeek, DualPipe overlaps forward and backward computation, reduces latency, and optimizes data movement across GPUs. By efficiently managing communication, it minimizes idle time (pipeline bubbles) and dynamically balances GPU compute cores (Streaming Multiprocessors) between computation and communication, preventing data transfer bottlenecks as the model scales.

A commenter on The Next Platform describes DualPipe as “essentially creating a virtual DPU on the GPU itself to handle all-to-all communication,” which highlights its role in optimizing data transfer efficiency.

The paper goes into further detail, “In order to ensure sufficient computational performance for DualPipe, we customize efficient cross-node all-to-all communication kernels (including dispatching and combining) to conserve the number of SMs dedicated to communication. The implementation of the kernels is co-designed with the MoE gating algorithm and the network topology of our cluster. To be specific, in our cluster, cross-node GPUs are fully interconnected with IB, and intra-node communications are handled via NVLink.”

Example DualPipe scheduling

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication. (Image credit: DeekSeek)

​ 

This articles is written by : Nermeen Nabil Khear Abdelmalak

All rights reserved to : USAGOLDMIES . www.usagoldmines.com

You can Enjoy surfing our website categories and read more content in many fields you may like .

Why USAGoldMines ?

USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.

Recent:

Best VPN services 2025: Top picks for speed, price, privacy, and more | usagoldmines.com

Pixel, Android Team Googlers Offered ‘Voluntary Exit Program’ Tim | usagoldmines.com

Gemini App Now Defaults to Gemini 2.0 Flash, 2.0 Pro Experimental Trial Rolls Out Kellen | usagoldmi...

The Five Best Tax Filing Services for 2025 Meredith Dietz | usagoldmines.com

Apple CEO Tim Cook: There's a 'Lot of Innovation' Left for Future iPhone Development Juli Clover | u...

Apple's Q1 2025 Earnings Call Takeaways Juli Clover | usagoldmines.com

Trump’s FCC chair investigates NPR and PBS, urges Congress to defund them Jon Brodkin | usagoldmines...

How one YouTuber is trying to poison the AI bots stealing her content Kyle Orland | usagoldmines.com

In Apple’s first-quarter earnings, the Mac leads the way in sales growth Samuel Axon | usagoldmines....

Best laptops 2025: Premium, budget, gaming, 2-in-1s, and more | usagoldmines.com

OpenAI's Reasoning Model Is Now Free on Copilot Michelle Ehrhardt | usagoldmines.com

Apple Reports Best Quarter Ever in 1Q 2025 Results: $36.3B Profit on $124.3B Revenue Jordan Golson |...

Apple Now Has More Than 2.35 Billion Active Devices Worldwide Juli Clover | usagoldmines.com

Largest desktop hard drive ever breaks another record; 28TB Seagate Expansion desktop hard drive has...

Best gaming laptops under $1,000: Expert picks that won’t break the bank | usagoldmines.com

This Tool Lets You Trim Videos Without Converting Them Justin Pot | usagoldmines.com

My Favorite Amazon Deal of the Day: The iPad Air M2 Daniel Oropeza | usagoldmines.com

Apple Might Start Buying Ads on X Again Juli Clover | usagoldmines.com

Watch out Nvidia, a Linux leak revealing three new Intel Arc Battlemage GPUs may challenge the RTX 5...

Copyright Office suggests AI copyright debate was settled in 1965 Ashley Belanger | usagoldmines.com

ChatGPT’s advanced AI costs $200/mo. Now it’s free for Windows users | usagoldmines.com

Microsoft ports DeepSeek’s AI to Copilot+ PCs, and their NPUs | usagoldmines.com

This wireless, solar-powered Eufy security camera is 46% off today | usagoldmines.com

The Bose QuietComfort Headphones Are on Sale for $179 Daniel Oropeza | usagoldmines.com

Eight Questions You Should Ask Yourself When Decluttering Your Home Lindsey Ellefson | usagoldmines....

Why Some Gym Machines Feel Heavier Than Others Beth Skwarecki | usagoldmines.com

Eight Useful Mac Apps Worth Checking Out Juli Clover | usagoldmines.com

Google's New 'Ask for Me' Search Feature Uses AI to Make Calls Juli Clover | usagoldmines.com

Report: DeepSeek’s chat histories and internal data were publicly exposed Kevin Purdy | usagoldmines...

VGHF opens free online access to 1,500 classic game mags, 30K historic files Kyle Orland | usagoldmi...

Eight Questions You Should Ask Yourself When Decluttering Lindsey Ellefson | usagoldmines.com

DeepSeek on steroids: Cerebras embraces controversial Chinese ChatGPT rival and promises 57x faster ...

Wacom warns users their data may have been stolen in breach | usagoldmines.com

DeepSeek disappears from the Italian App Store and Google Play Store amid privacy complaint chiara.c...

In surprise move Microsoft announces DeepSeek R1 is coming to CoPilot+ PCs – here’s how to get it ha...

BioWare has quietly laid off long-time Dragon Age devs as it downsizes the studio and turns its focu...

Max rolls out a new multiview feature for 2025's NASCAR Cup Series that puts you in the driver's sea...

Annoyed Samsung fans have started a petition to bring Bluetooth back to the S Pen – and they have a ...

Wix's new AI tool aims to take you from idea to profit in record time | usagoldmines.com

I can’t believe the Samsung Galaxy S25 is still the only phone of its kind to have this one crucial ...

Vodafone makes 'world's first' satellite video call with a standard phone –here's why that's a big d...

Forget mega yachts, AI data centers are quickly becoming the next battleground for billionaires as Z...

North Korean Lazarus hackers launch large-scale cyberattack by cloning open source software | usago...

Amazon Prime Video has ads now. Here’s how to stop them | usagoldmines.com

U-tec Ultraloq Bolt Fingerprint Matter review: Now hear this? | usagoldmines.com

DEAL: Galaxy Ring for $149 When You Trade-in Any Smartwatch ($250 Off) Tim | usagoldmines.com

T-Mobile Brings Back Free MLS Season Pass Through Apple TV Kellen | usagoldmines.com

Your DeepSeek Chats May Have Been Exposed Online Jake Peterson | usagoldmines.com

Apple Highlights Hearing Health Issues Leading Up to Super Bowl LIX Eric Slivka | usagoldmines.com

Apple's Back to School Sale Launches in Japan With Apple Gift Cards Eric Slivka | usagoldmines.com

I agree with OpenAI: You shouldn’t use other peoples’ work without permission Andrew Cunningham | us...

OpenAI teases “new era” of AI in US, deepens ties with government Ashley Belanger | usagoldmines.com

Lenovo Legion 5i review: This speed demon is a bargain | usagoldmines.com

This Ryzen 7 mini PC with 32GB RAM hits its lowest price ever: $499 | usagoldmines.com

6 surprisingly helpful uses for the USB port on your router | usagoldmines.com

Is your VPN app really secure? Check for this new ‘verified’ symbol | usagoldmines.com

ATSC 3.0: The future of broadcast TV spent another year stuck in neutral | usagoldmines.com

Netflix now lets you download entire seasons with a single click | usagoldmines.com

That teeny-tiny Asus Zenbook A14 laptop from CES is now for sale | usagoldmines.com

ChatGPT update brings more knowledge and better image recognition | usagoldmines.com

Asus says don’t worry about GPUs scratched by Q-Release PCIe slots | usagoldmines.com

New Flappy Golf Title Soon Coming to Android and iOS Tim | usagoldmines.com

Microsoft now hosts AI model accused of copying OpenAI data Benj Edwards | usagoldmines.com

ATSC 3.0: The future of broadcast TV spent another year stuck in neutral | usagoldmines.com

Nothing Says the Nothing Phone 3a is Coming March 4 Kellen | usagoldmines.com

'Liked Songs Manager' Automatically Turns Your Spotify Likes Into Playlists Justin Pot | usagoldmine...

Comcast Just Gave Six Cities an Early Look at Lag-Free Internet Michelle Ehrhardt | usagoldmines.com

Watch out, your office phone could be hijacked into a Mirai botnet | usagoldmines.com

The Future Games Show returns in March for its spring showcase and will include live broadcast from ...

Criminals are abusing top-level government domains across multiple countries | usagoldmines.com

Microsoft says its revenue dropped by 7% in its Q2 2025 earnings while Xbox hardware sales dropped b...

Civ 7 requirements for PC, Steam Deck, Linux, and Mac | usagoldmines.com

DeepSeek just insisted it's ChatGPT, and I think that's all the proof I need lance.ulanoff@futurenet...

The fate of Nvidia’s GeForce RTX 50-series lies in DLSS 4’s hands | usagoldmines.com

This tiny 2K security camera is super cheap at just $25 right now | usagoldmines.com

Microsoft updates new Surface Pro, Laptop with Intel inside | usagoldmines.com

Nvidia’s GeForce RTX 5090 and 5080 sell out almost instantly | usagoldmines.com

NordVPN’s new protocol is designed to evade VPN restrictions | usagoldmines.com

Windows 11’s Auto HDR works again, but you have to manually update first | usagoldmines.com

This Video Doorbell Is $80 Right Now, and It Doesn't Need a Monthly Subscription Pradershika Sharma ...

Samsung Introduces Major Discounts on TVs, Monitors, and More Ahead of Super Bowl LIX Mitchel Brouss...

Microsoft’s new Surface for Business PCs have AI firmly at the core | usagoldmines.com

Why businesses must avoid ‘AI FOMO’ at all costs | usagoldmines.com

Netflix just released an ominous first teaser clip of You season 5, but I'm still recovering from se...

Stranger Things season 5's 12-month shoot yielded 650-plus hours of footage for its eight 'blockbust...

AI safety at a crossroads: why US leadership hinges on stronger industry guidelines | usagoldmines....

Bennu asteroid samples yield watery history, key molecules for life Timothy J McCoy and Sara Russell...

Microsoft updates Intel-based Surface PCs, but regular people still can’t buy them Andrew Cunningham...

If you hate passwords, switch to this other kind of login right now | usagoldmines.com

Today’s best laptop deals: Save big on work, school, home use, and gaming | usagoldmines.com

Your Phone Makes a Great Reading Device, Actually Justin Pot | usagoldmines.com

It's About to Get Much Easier to Cancel Your Subscriptions Meredith Dietz | usagoldmines.com

Apple Continues to Be the World's Most Admired Company Hartley Charlton | usagoldmines.com

AI agents are proving remarkably popular - but firms still face many challenges | usagoldmines.com

New DeepSeek AI rival claims to be more powerful than both V3 and ChatGPT-4o – meet Qwen2.5-Max | u...

Netflix reveals June 2025 release date for Squid Game season 3, and its first clip teases a new mini...

RX 9070 GPU could theoretically be an RTX 5070 killer, I’m just worried that AMD may not go for Nvid...

Nvidia’s RTX 50-series could be a huge flop if gamers reject DLSS 4 | usagoldmines.com

Unlock hands-free Kindle reading with this $16 page-turner add-on | usagoldmines.com

Mark Zuckerberg just teased next-gen Ray-Ban smart glasses – here are 4 things I want to see hamish....

Leave a Reply