Posted on

DeepSeek-R1: Budgeting challenges for on-premise deployments

Until now, IT leaders have needed to consider the cyber security risks posed by allowing users to access large language models (LLMs) like ChatGPT directly via the cloud. The alternative has been to use open source LLMs that can be hosted on-premise or accessed via a private cloud. 

The artificial intelligence (AI) model needs to run in-memory and, when using graphics processing units (GPUs) for AI acceleration, this means IT leaders need to consider the costs associated with purchasing banks of GPUs to build up enough memory to hold the entire model.

Nvidia’s high-end AI acceleration GPU, the H100, is configured with 80Gbytes of random-access memory (RAM), and its specification shows it’s rated at 350w in terms of energy use.

China’s DeepSeek has been able to demonstrate that its R1 LLM can rival US artificial intelligence without the need to resort to the latest GPU hardware. It does, however, benefit from GPU-based AI acceleration.

Nevertheless, deploying a private version of DeepSeek still requires significant hardware investment. To run the entire DeepSeek-R1 model, which has 671 billion parameters in-memory, requires 768Gbytes of memory. With Nvidia H100 GPUs, which are configured with 80GBytes of video memory card each, 10 would be required to ensure the entire DeepSeek-R1 model can run in-memory. 

IT leaders may well be able to negotiate volume discounts, but the cost of just the AI acceleration hardware to run DeepSeek is around $250,000.

Less powerful GPUs can be used, which may help to reduce this figure. But given current GPU prices, a server capable of running the complete 670 billion-parameter DeepSeek-R1 model in-memory is going to cost over $100,000.

The server could be run on public cloud infrastructure. Azure, for instance, offers access to the Nvidia H100 with 900 GBytes of memory for $27.167 per hour, which, on paper, should easily be able to run the 671 billion-parameter DeepSeek-R1 model entirely in-memory.

If this model is used every working day, and assuming a 35-hour week and four weeks a year of holidays and downtime, the annual Azure bill would be almost $46,000 a year. Again, this figure could be reduced significantly to $16.63 per hour ($23,000) per year if there is a three-year commitment.

Less powerful GPUs will clearly cost less, but it’s the memory costs that make these prohibitive. For instance, looking at current Google Cloud pricing, the Nvidia T4 GPU is priced at $0.35 per GPU per hour, and is available with up to four GPUs, giving a total of 64 Gbytes of memory for $1.40 per hour, and 12 would be needed to fit the DeepSeek-R1 671 billion-parameter model entirely-in memory, which works out at $16.80 per hour. With a three-year commitment, this figure comes down to $7.68, which works out at just under $13,000 per year.

A cheaper approach

IT leaders can reduce costs further by avoiding expensive GPUs altogether and relying entirely on general-purpose central processing units (CPUs). This setup is really only suitable when DeepSeek-R1 is used purely for AI inference.

A recent tweet from Matthew Carrigan, machine learning engineer at Hugging Face, suggests such a system could be built using two AMD Epyc server processors and 768 Gbytes of fast memory. The system he presented in a series of tweets could be put together for about $6,000.

Responding to comments on the setup, Carrigan said he is able to achieve a processing rate of six to eight tokens per second, depending on the specific processor and memory speed that is installed. It also depends on the length of the natural language query, but his tweet includes a video showing near-real-time querying of DeepSeek-R1 on the hardware he built based on the dual AMD Epyc setup and 768Gbytes of memory.

Carrigan acknowledges that GPUs will win on speed, but they are expensive. In his series of tweets, he points out that the amount of memory installed has a direct impact on performance. This is due to the way DeepSeek “remembers” previous queries to get to answers quicker. The technique is called Key-Value (KV) caching.

“In testing with longer contexts, the KV cache is actually bigger than I realised,” he said, and suggested that the hardware configuration would require 1TBytes of memory instead of 76Gbytes, when huge volumes of text or context is pasted into the DeepSeek-R1 query prompt.

Buying a prebuilt Dell, HPE or Lenovo server to do something similar is likely to be considerably more expensive, depending on the processor and memory configurations specified.

A different way to address memory costs

Among the approaches that can be taken to reduce memory costs is using multiple tiers of memory controlled by a custom chip. This is what California startup SambaNova has done using its SN40L Reconfigurable Dataflow Unit (RDU) and a proprietary dataflow architecture for three-tier memory.

“DeepSeek-R1 is one of the most advanced frontier AI models available, but its full potential has been limited by the inefficiency of GPUs,” said Rodrigo Liang, CEO of SambaNova.

The company, which was founded in 2017 by a group of ex-Sun/Oracle engineers and has an ongoing collaboration with Stanford University’s electrical engineering department, claims the RDU chip collapses the hardware requirements to run DeepSeek-R1 efficiently from 40 racks down to one rack configured with 16 RDUs.

Earlier this month at the Leap 2025 conference in Riyadh, SambaNova signed a deal to introduce Saudi Arabia’s first sovereign LLM-as-a-service cloud platform. Saud AlSheraihi, vice-president of digital solutions at Saudi Telecom Company, said: “This collaboration with SambaNova marks a significant milestone in our journey to empower Saudi enterprises with sovereign AI capabilities. By offering a secure and scalable inferencing-as-a-service platform, we are enabling organisations to unlock the full potential of their data while maintaining complete control.”

This deal with the Saudi Arabian telco provider illustrates how governments need to consider all options when building out sovereign AI capacity. DeepSeek demonstrated that there are alternative approaches that can be just as effective as the tried and tested method of deploying immense and costly arrays of GPUs.

And while it does indeed run better, when GPU-accelerated AI hardware is present, what SambaNova is claiming is that there is also an alternative way to achieve the same performance for running models like DeepSeek-R1 on-premise, in-memory, without the costs of having to acquire GPUs fitted with the memory the model needs.

Source

Posted on

AMD CES 2025 Keynote live blog: as it happened

Refresh

2025-01-06T18:40:08.666Z

Good morning folks. We’re queueing up outside the South Seas Ballroom at Mandalay Bay, awaiting the start of AMD’s CES 2025 keynote, and it’s sure to be a packed 45 minutes to an hour. I’ll be here bringing you all the latest news as it breaks, as well as my thoughts on what’s being announced.

I’ll keep you updated once I’m in my seat, so stay tuned!

2025-01-06T18:58:45.541Z

The stage at AMD's CES 2025 press conference

(Image credit: Future / John Loeffler)

We’re five minutes away from the start of AMD’s press conference, so it’s time to settle in.

2025-01-06T19:03:53.280Z

AMD Senior VP Jack Huynh is taking the stage now, No Lisa Su this time.

2025-01-06T19:06:21.698Z

The AMD Ryzen 9 9950X3D and 9900X3D are up first.

2025-01-06T19:09:16.673Z

Slides from the AMD CES 2025 keynote

(Image credit: Future / John Loeffler)

Not to brag or anything…

2025-01-06T19:11:58.810Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

Ryzen 9 9950X3D and 9900X3D coming in March 2025.

2025-01-06T19:13:32.508Z

AMD Ryzen 9 9955HX3D coming to laptops, along with a pair of non-X3D HX chips (I missed the model names of the other two, I’ll grab those in a sec).

2025-01-06T19:15:08.368Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

AMD’s SVP of Client Business Rahul Tikoo is on stage now to talk about AI PCs.

New Ryzen AI 300 chips, targeting the midrange user with Ryzen AI 7 350 and Ryzen 5 340.

2025-01-06T19:25:28.725Z

Image 1 of 4

A slide showing the new AMD Ryzen AI Max skus(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)

Now we’re moving on to the new Ryzen AI Max series, which are workstation CPUs with up to 40 RDNA 3.5 compute units, which is a hell of a lot for an integrated GPU. Up to 50 TOPS XDNA 2 NPU, and up to 256GB/s memory bandwidth.

2025-01-06T19:27:19.802Z

Ok, so we’re on to enterprise products, namely AMD Epyc and AMD Instinct data center CPU and GPUs.

2025-01-06T19:28:02.748Z

We’ve also got some discussion of AMD Ryzen AI 300 Pro.

2025-01-06T19:30:49.763Z

I have no idea what TCO means, but Shell says AMD Ryzen CPUs offer the best, so there’s that.

2025-01-06T19:32:27.820Z

Now PC manufacturer executives are singing AMD’s praises, including HP, Lenovo, and Asus.

2025-01-06T19:38:46.234Z

An AMD and Dell Executive talking about the new Dell Pro portfolio at CES 2025

(Image credit: Future / John Loeffler)

So Dell is now on stage with AMD talking about the first Dell professional PCs and laptops to feature AMD chips. Oh, and Dell is completely rebranding its entire product portfolio, but that’s for another news story.

2025-01-06T19:41:09.567Z

Everyone keeps talking about the ‘AI revolution’, but honestly, I’ve yet to see anything from AI PCs so far that is truly revolutionary. I’m sure its coming at some point in the future, but the future isn’t here just yet.

2025-01-06T19:47:13.070Z

OK, so the press conference has wrapped, and there was no discussion of AMD Radeon graphics cards, as we were expecting, but we know they’re coming so there might be more to come on that over the next few days.

For now, though, the big news is the new Ryzen 9 9950X3D and Ryzen 9 9900X3D chips due out in March, as well as new high-performance mobile ships for both enthusiasts, gamers, and enterprise users.

There’ll be more from me today, but for now, we have to clear out of the ballroom, so stay tuned for more from us here at CES 2025.

Source

Posted on

AMD CES 2025 Keynote live blog: as it happened

Refresh

2025-01-06T18:40:08.666Z

Good morning folks. We’re queueing up outside the South Seas Ballroom at Mandalay Bay, awaiting the start of AMD’s CES 2025 keynote, and it’s sure to be a packed 45 minutes to an hour. I’ll be here bringing you all the latest news as it breaks, as well as my thoughts on what’s being announced.

I’ll keep you updated once I’m in my seat, so stay tuned!

2025-01-06T18:58:45.541Z

The stage at AMD's CES 2025 press conference

(Image credit: Future / John Loeffler)

We’re five minutes away from the start of AMD’s press conference, so it’s time to settle in.

2025-01-06T19:03:53.280Z

AMD Senior VP Jack Huynh is taking the stage now, No Lisa Su this time.

2025-01-06T19:06:21.698Z

The AMD Ryzen 9 9950X3D and 9900X3D are up first.

2025-01-06T19:09:16.673Z

Slides from the AMD CES 2025 keynote

(Image credit: Future / John Loeffler)

Not to brag or anything…

2025-01-06T19:11:58.810Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

Ryzen 9 9950X3D and 9900X3D coming in March 2025.

2025-01-06T19:13:32.508Z

AMD Ryzen 9 9955HX3D coming to laptops, along with a pair of non-X3D HX chips (I missed the model names of the other two, I’ll grab those in a sec).

2025-01-06T19:15:08.368Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

AMD’s SVP of Client Business Rahul Tikoo is on stage now to talk about AI PCs.

New Ryzen AI 300 chips, targeting the midrange user with Ryzen AI 7 350 and Ryzen 5 340.

2025-01-06T19:25:28.725Z

Image 1 of 4

A slide showing the new AMD Ryzen AI Max skus(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)

Now we’re moving on to the new Ryzen AI Max series, which are workstation CPUs with up to 40 RDNA 3.5 compute units, which is a hell of a lot for an integrated GPU. Up to 50 TOPS XDNA 2 NPU, and up to 256GB/s memory bandwidth.

2025-01-06T19:27:19.802Z

Ok, so we’re on to enterprise products, namely AMD Epyc and AMD Instinct data center CPU and GPUs.

2025-01-06T19:28:02.748Z

We’ve also got some discussion of AMD Ryzen AI 300 Pro.

2025-01-06T19:30:49.763Z

I have no idea what TCO means, but Shell says AMD Ryzen CPUs offer the best, so there’s that.

2025-01-06T19:32:27.820Z

Now PC manufacturer executives are singing AMD’s praises, including HP, Lenovo, and Asus.

2025-01-06T19:38:46.234Z

An AMD and Dell Executive talking about the new Dell Pro portfolio at CES 2025

(Image credit: Future / John Loeffler)

So Dell is now on stage with AMD talking about the first Dell professional PCs and laptops to feature AMD chips. Oh, and Dell is completely rebranding its entire product portfolio, but that’s for another news story.

2025-01-06T19:41:09.567Z

Everyone keeps talking about the ‘AI revolution’, but honestly, I’ve yet to see anything from AI PCs so far that is truly revolutionary. I’m sure its coming at some point in the future, but the future isn’t here just yet.

2025-01-06T19:47:13.070Z

OK, so the press conference has wrapped, and there was no discussion of AMD Radeon graphics cards, as we were expecting, but we know they’re coming so there might be more to come on that over the next few days.

For now, though, the big news is the new Ryzen 9 9950X3D and Ryzen 9 9900X3D chips due out in March, as well as new high-performance mobile ships for both enthusiasts, gamers, and enterprise users.

There’ll be more from me today, but for now, we have to clear out of the ballroom, so stay tuned for more from us here at CES 2025.

Source

Posted on

Intel and AMD may have another desktop competitor

  • A new suggestion from a reliable leaker hints at Qualcomm’s new CPU heading to desktop PCs
  • The 2nd-gen Snapdragon X Elite processor codenamed ‘Project Glymur’ was tested with liquid cooling AIO
  • The chip likely be unveiled at CES 2025 in a few weeks

Both Intel and AMD have dominated the desktop PC scene when it comes to providing powerful processors for productivity and gaming – and now, Qualcomm could be joining the party, with 2nd-gen Snapdragon X Elite processors potentially making their way to desktop PCs.

As highlighted by Notebookcheck, reliable leaker Roland Quandt has hinted at Qualcomm’s new processor coming to desktop PCs as the brand is reportedly testing the SC8480XP (Project Glymur chip codename) with a 120 mm liquid cooling AIO. This assumption comes from the fact that AIOs as such being used for gaming desktop configurations, unlike the cooling mechanisms that would be required in lightweight laptops.

With CES 2025 now only weeks away, we could soon see what Qualcomm has to offer and whether Quandt’s prediction is accurate. The 2nd-gen Snapdragon X Elite processors may take advantage of Oryon V3 cores according to Quandt (based on Qualcomm CEO Cristiano Amon’s ‘next-gen’ CPU statements), so there could be a lot to get excited about here.

Qualcomm Snapdragon X Elite

(Image credit: Qualcomm)

Could 2025’s CES event be one of the best in years?

Considering AMD and Nvidia’s presence at CES 2025 and their inevitable reveals of the Radeon RX 8000 series and RTX 5000 series GPUs, Qualcomm’s inclusion could easily make this one of the more interesting CES events in years.

While a potential new Snapdragon X Elite processor for desktop PCs could be beneficial for gamers with tight budgets (especially as a second-gen version of the existing X Elite), it’s still a little too early to suggest this. On laptops such as the Lenovo Yoga 7x Slim, gaming is possible but certainly not comparable to gaming laptops or handheld gaming PCs, and Qualcomm itself has stated that the X Elite chips are not targeted at serious gamers.

Nonetheless, the Yoga 7x Slim and fellow X Elite laptops come without discrete GPUs – for a desktop gaming PC that has a discrete GPU, a new Snapdragon chip could be promising depending on the improvements made with the new processors; potentially adding to the list of surprises I hope to see at CES 2025. Mind you, I don’t want to have to buy a new motherboard…

You might also like…

{ window.reliablePageLoad.then(() => { var componentContainer = document.querySelector(“#slice-container-newsletterForm-articleInbodyContent-NP89AAthSg9prV59t2ChUN”); if (componentContainer) { var data = {“layout”:”inbodyContent”,”header”:”Get daily insight, inspiration and deals in your inbox”,”tagline”:”Sign up for breaking news, reviews, opinion, top tech deals, and more.”,”formFooterText”:”By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.”,”successMessage”:{“body”:”Thank you for signing up. You will receive a confirmation email shortly.”},”failureMessage”:”There was a problem. Please refresh the page and try again.”,”method”:”POST”,”inputs”:[{“type”:”hidden”,”name”:”NAME”},{“type”:”email”,”name”:”MAIL”,”placeholder”:”Your Email Address”,”required”:true},{“type”:”hidden”,”name”:”NEWSLETTER_CODE”,”value”:”XTR-D”},{“type”:”hidden”,”name”:”LANG”,”value”:”EN”},{“type”:”hidden”,”name”:”SOURCE”,”value”:”60″},{“type”:”hidden”,”name”:”COUNTRY”},{“type”:”checkbox”,”name”:”CONTACT_OTHER_BRANDS”,”label”:{“text”:”Contact me with news and offers from other Future brands”}},{“type”:”checkbox”,”name”:”CONTACT_PARTNERS”,”label”:{“text”:”Receive email from us on behalf of our trusted partners or sponsors”}},{“type”:”submit”,”value”:”Sign me up”,”required”:true}],”endpoint”:”https://newsletter-subscribe.futureplc.com/v2/submission/submit”,”analytics”:[{“analyticsType”:”widgetViewed”}],”ariaLabels”:{}}; var triggerHydrate = function() { window.sliceComponents.newsletterForm.hydrate(data, componentContainer); } if (window.lazyObserveElement) { window.lazyObserveElement(componentContainer, triggerHydrate); } else { triggerHydrate(); } } }).catch(err => console.error(‘%c FTE ‘,’background: #9306F9; color: #ffffff’,’Hydration Script has failed for newsletterForm-articleInbodyContent-NP89AAthSg9prV59t2ChUN Slice’, err)); }).catch(err => console.error(‘%c FTE ‘,’background: #9306F9; color: #ffffff’,’Externals script failed to load’, err)); ]]>

Sign up for breaking news, reviews, opinion, top tech deals, and more.

Source