Posted on

DeepSeek-R1: Budgeting challenges for on-premise deployments

Until now, IT leaders have needed to consider the cyber security risks posed by allowing users to access large language models (LLMs) like ChatGPT directly via the cloud. The alternative has been to use open source LLMs that can be hosted on-premise or accessed via a private cloud. 

The artificial intelligence (AI) model needs to run in-memory and, when using graphics processing units (GPUs) for AI acceleration, this means IT leaders need to consider the costs associated with purchasing banks of GPUs to build up enough memory to hold the entire model.

Nvidia’s high-end AI acceleration GPU, the H100, is configured with 80Gbytes of random-access memory (RAM), and its specification shows it’s rated at 350w in terms of energy use.

China’s DeepSeek has been able to demonstrate that its R1 LLM can rival US artificial intelligence without the need to resort to the latest GPU hardware. It does, however, benefit from GPU-based AI acceleration.

Nevertheless, deploying a private version of DeepSeek still requires significant hardware investment. To run the entire DeepSeek-R1 model, which has 671 billion parameters in-memory, requires 768Gbytes of memory. With Nvidia H100 GPUs, which are configured with 80GBytes of video memory card each, 10 would be required to ensure the entire DeepSeek-R1 model can run in-memory. 

IT leaders may well be able to negotiate volume discounts, but the cost of just the AI acceleration hardware to run DeepSeek is around $250,000.

Less powerful GPUs can be used, which may help to reduce this figure. But given current GPU prices, a server capable of running the complete 670 billion-parameter DeepSeek-R1 model in-memory is going to cost over $100,000.

The server could be run on public cloud infrastructure. Azure, for instance, offers access to the Nvidia H100 with 900 GBytes of memory for $27.167 per hour, which, on paper, should easily be able to run the 671 billion-parameter DeepSeek-R1 model entirely in-memory.

If this model is used every working day, and assuming a 35-hour week and four weeks a year of holidays and downtime, the annual Azure bill would be almost $46,000 a year. Again, this figure could be reduced significantly to $16.63 per hour ($23,000) per year if there is a three-year commitment.

Less powerful GPUs will clearly cost less, but it’s the memory costs that make these prohibitive. For instance, looking at current Google Cloud pricing, the Nvidia T4 GPU is priced at $0.35 per GPU per hour, and is available with up to four GPUs, giving a total of 64 Gbytes of memory for $1.40 per hour, and 12 would be needed to fit the DeepSeek-R1 671 billion-parameter model entirely-in memory, which works out at $16.80 per hour. With a three-year commitment, this figure comes down to $7.68, which works out at just under $13,000 per year.

A cheaper approach

IT leaders can reduce costs further by avoiding expensive GPUs altogether and relying entirely on general-purpose central processing units (CPUs). This setup is really only suitable when DeepSeek-R1 is used purely for AI inference.

A recent tweet from Matthew Carrigan, machine learning engineer at Hugging Face, suggests such a system could be built using two AMD Epyc server processors and 768 Gbytes of fast memory. The system he presented in a series of tweets could be put together for about $6,000.

Responding to comments on the setup, Carrigan said he is able to achieve a processing rate of six to eight tokens per second, depending on the specific processor and memory speed that is installed. It also depends on the length of the natural language query, but his tweet includes a video showing near-real-time querying of DeepSeek-R1 on the hardware he built based on the dual AMD Epyc setup and 768Gbytes of memory.

Carrigan acknowledges that GPUs will win on speed, but they are expensive. In his series of tweets, he points out that the amount of memory installed has a direct impact on performance. This is due to the way DeepSeek “remembers” previous queries to get to answers quicker. The technique is called Key-Value (KV) caching.

“In testing with longer contexts, the KV cache is actually bigger than I realised,” he said, and suggested that the hardware configuration would require 1TBytes of memory instead of 76Gbytes, when huge volumes of text or context is pasted into the DeepSeek-R1 query prompt.

Buying a prebuilt Dell, HPE or Lenovo server to do something similar is likely to be considerably more expensive, depending on the processor and memory configurations specified.

A different way to address memory costs

Among the approaches that can be taken to reduce memory costs is using multiple tiers of memory controlled by a custom chip. This is what California startup SambaNova has done using its SN40L Reconfigurable Dataflow Unit (RDU) and a proprietary dataflow architecture for three-tier memory.

“DeepSeek-R1 is one of the most advanced frontier AI models available, but its full potential has been limited by the inefficiency of GPUs,” said Rodrigo Liang, CEO of SambaNova.

The company, which was founded in 2017 by a group of ex-Sun/Oracle engineers and has an ongoing collaboration with Stanford University’s electrical engineering department, claims the RDU chip collapses the hardware requirements to run DeepSeek-R1 efficiently from 40 racks down to one rack configured with 16 RDUs.

Earlier this month at the Leap 2025 conference in Riyadh, SambaNova signed a deal to introduce Saudi Arabia’s first sovereign LLM-as-a-service cloud platform. Saud AlSheraihi, vice-president of digital solutions at Saudi Telecom Company, said: “This collaboration with SambaNova marks a significant milestone in our journey to empower Saudi enterprises with sovereign AI capabilities. By offering a secure and scalable inferencing-as-a-service platform, we are enabling organisations to unlock the full potential of their data while maintaining complete control.”

This deal with the Saudi Arabian telco provider illustrates how governments need to consider all options when building out sovereign AI capacity. DeepSeek demonstrated that there are alternative approaches that can be just as effective as the tried and tested method of deploying immense and costly arrays of GPUs.

And while it does indeed run better, when GPU-accelerated AI hardware is present, what SambaNova is claiming is that there is also an alternative way to achieve the same performance for running models like DeepSeek-R1 on-premise, in-memory, without the costs of having to acquire GPUs fitted with the memory the model needs.

Source

Posted on

Has Pure got the first of its ‘HDD is doomed’ ducks in a row?

Pure Storage thinks things are slotting into place for its predicted imminent demise of enterprise spinning disk.

In December 2024, it announced an unnamed hyperscaler had inked an agreement to take Pure’s DirectFlash Modules (DFMs) as components for storage infrastructure.

Meanwhile, Pure Storage now counts Nand flash makers Micron and Kioxia as supply chain partners.

The Micron partnership was announced earlier this month, with Pure making plans to take quantities of Micron’s gen 9 QLC Nand memory.

Last month, Pure and Kioxia announced the latter would supply QLC flash for DFM modules to supply to hyperscaler customers.

Here, Pure Storage is setting itself up as a provider of hyperscaler systems or components in a ground-breaking move for an enterprise storage array maker.

The wider significance is that because hyperscalers are such huge buyers of hard drives, a switch to all-flash would make a big dent in spinning disk manufacturing volumes, and that could spell the hard disk drive’s (HDD’s) death knell. 

Selling to hyperscalers: The nails in HDD’s coffin?

In June 2024, Pure announced it had been working to adapt its DFM technology to the needs of hyperscaler environments. DFMs are not ordinary SSDs, like those sold by the big drive makers. Because Pure controls DFM design and manufacture, and because they also design and build controller systems, data management functionality can be distributed across drive and array systems.

According to Pure, that brings efficiencies in use of cache and data placement that in part can make for better longevity in QLC-based flash.

It also means less energy used, more rapid input/output (I/O) and savings on space that allow for more Nand to be installed. That amounts to a claimed capacity multiplier of around 2.5x compared with what’s possible from commodity SSD-equipped arrays. For hyperscalers that buy massive quantities of drive capacity, these advantages are significant.

Pure Storage said one hyperscaler has sung the praises of its DFMs after deploying a proof-of-concept.

For Pure Storage, the challenge will be scale in the supply chain. Amazon Web Services (AWS), Azure, GCP and Meta buy about 43% of global server production. And they only buy white box hardware that they customise themselves. That market is one hitherto effectively barred to enterprise storage makers because their products are not specialised to it.

So, according to their strategy, Pure Storage will sell their DFMs as components that will work with the hyperscalers’ own storage. Officially, it’s not known which hyperscaler Pure has struck a deal with, but it is known that GCP and Meta, at least, have driven the adoption of the software data placement technique, flexible data placement.

SSDs with 10x more capacity than HDD

Until now, hyperscalers have preferred to use spinning disk HDDs to drive their storage services largely because they have been cheaper. But they are also slower. And, with the advent of artificial intelligence (AI), the need for more rapid access to colder data has arisen – such as in backups and data lakes – and so the big hosting companies have started to look at SSD.

However, so far, SSD had lacked the capacity to be profitably deployed. Now, the latest generations of QLC flash from Micron and Kioxia allow Pure to make DFMs that provide 150TB, which will soon reach 300TB, the equivalent of 10 HDDs.

Kioxia’s latest generation of Nand flash, unveiled late last year, uses charge trap (CT) cells to create smaller SSDs with higher density and while using less energy. Meanwhile, Kioxia also released test results that showed writes with flexible data placement (using NoSQL database RocksDB) that gave read speed 1.8x faster and Nand cell lifespan increased by 3x.

Micron is already a supplier to Pure Storage of Nand in its DFMs. It hasn’t shared much detail about its next generation of SSD, but what is known is that its Nand circuits will give 19% more capacity than the current one.

In December 2024, Pure Storage announced quarterly revenue of $831m, 9% up year-on-year. That puts it behind Dell, which generated revenue of $4bn in the past quarter (up 4% year-on-year); also behind NetApp, which took $1.66bn in the same period (up 6% year-on-year), and almost certainly behind HPE, which doesn’t disclose the share taken by storage in its quarterly revenue of $8.5bn.

Is it the beginning of the end for HDD?

Will Pure’s partnership to supply its high-capacity flash modules to a hyperscaler customer be the first set of nails in the coffin of spinning disk hard drives?

Pure Storage chief technology officer Rob Lee said last week at a press event in Prague that the company’s first hyperscaler design win will be “transformative”, and that a switch to flash by the hyperscalers could lead to collapse in the HDD market.

The deal he’s talking about was announced in December, and will see Pure supply its DFM SSD modules – which will offer up to 300TB capacity by 2026 – to an unnamed hyperscaler.

“We won’t be supplying arrays,” said Lee. “They want the benefits of direct flash but don’t need the other data services. We’re co-engineering with the hyperscaler to integrate with their custom system.

“They were all ready to build something like DFM, but then thought, ‘Why build it ourselves? Let’s just integrate [Pure’s flash modules]’.”

He said the move on the part of the hyperscalers is driven by data growth and the needs of AI, in particular the requirement to access large and relatively dormant stores of data.

Lee added that there is something like 100,000 exabytes of HDD produced quarterly, with hyperscalers taking “60% or 70%”. That, in turn, would take such a chunk out of the volume of HDD manufacturing as to make it much less viable.

Source

Posted on

AMD CES 2025 Keynote live blog: as it happened

Refresh

2025-01-06T18:40:08.666Z

Good morning folks. We’re queueing up outside the South Seas Ballroom at Mandalay Bay, awaiting the start of AMD’s CES 2025 keynote, and it’s sure to be a packed 45 minutes to an hour. I’ll be here bringing you all the latest news as it breaks, as well as my thoughts on what’s being announced.

I’ll keep you updated once I’m in my seat, so stay tuned!

2025-01-06T18:58:45.541Z

The stage at AMD's CES 2025 press conference

(Image credit: Future / John Loeffler)

We’re five minutes away from the start of AMD’s press conference, so it’s time to settle in.

2025-01-06T19:03:53.280Z

AMD Senior VP Jack Huynh is taking the stage now, No Lisa Su this time.

2025-01-06T19:06:21.698Z

The AMD Ryzen 9 9950X3D and 9900X3D are up first.

2025-01-06T19:09:16.673Z

Slides from the AMD CES 2025 keynote

(Image credit: Future / John Loeffler)

Not to brag or anything…

2025-01-06T19:11:58.810Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

Ryzen 9 9950X3D and 9900X3D coming in March 2025.

2025-01-06T19:13:32.508Z

AMD Ryzen 9 9955HX3D coming to laptops, along with a pair of non-X3D HX chips (I missed the model names of the other two, I’ll grab those in a sec).

2025-01-06T19:15:08.368Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

AMD’s SVP of Client Business Rahul Tikoo is on stage now to talk about AI PCs.

New Ryzen AI 300 chips, targeting the midrange user with Ryzen AI 7 350 and Ryzen 5 340.

2025-01-06T19:25:28.725Z

Image 1 of 4

A slide showing the new AMD Ryzen AI Max skus(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)

Now we’re moving on to the new Ryzen AI Max series, which are workstation CPUs with up to 40 RDNA 3.5 compute units, which is a hell of a lot for an integrated GPU. Up to 50 TOPS XDNA 2 NPU, and up to 256GB/s memory bandwidth.

2025-01-06T19:27:19.802Z

Ok, so we’re on to enterprise products, namely AMD Epyc and AMD Instinct data center CPU and GPUs.

2025-01-06T19:28:02.748Z

We’ve also got some discussion of AMD Ryzen AI 300 Pro.

2025-01-06T19:30:49.763Z

I have no idea what TCO means, but Shell says AMD Ryzen CPUs offer the best, so there’s that.

2025-01-06T19:32:27.820Z

Now PC manufacturer executives are singing AMD’s praises, including HP, Lenovo, and Asus.

2025-01-06T19:38:46.234Z

An AMD and Dell Executive talking about the new Dell Pro portfolio at CES 2025

(Image credit: Future / John Loeffler)

So Dell is now on stage with AMD talking about the first Dell professional PCs and laptops to feature AMD chips. Oh, and Dell is completely rebranding its entire product portfolio, but that’s for another news story.

2025-01-06T19:41:09.567Z

Everyone keeps talking about the ‘AI revolution’, but honestly, I’ve yet to see anything from AI PCs so far that is truly revolutionary. I’m sure its coming at some point in the future, but the future isn’t here just yet.

2025-01-06T19:47:13.070Z

OK, so the press conference has wrapped, and there was no discussion of AMD Radeon graphics cards, as we were expecting, but we know they’re coming so there might be more to come on that over the next few days.

For now, though, the big news is the new Ryzen 9 9950X3D and Ryzen 9 9900X3D chips due out in March, as well as new high-performance mobile ships for both enthusiasts, gamers, and enterprise users.

There’ll be more from me today, but for now, we have to clear out of the ballroom, so stay tuned for more from us here at CES 2025.

Source

Posted on

GenAI demand fuels record sales of datacentre hardware and software in 2024

Demand for generative artificial intelligence (AI) services is being cited as the reason why spending on datacentre hardware and software hit a record high in 2024.

According to figures from IT analyst Synergy Research Group, total spending in the datacentre hardware and software market was up 34% year-on-year during 2024, as a result of hyperscale providers and private enterprises looking to kit out AI-ready server farms.

John Dinsdale, chief analyst at Synergy Research Group, said this trend had led to more investment in graphics processing units (GPUs), which had in turn “lit a fire under a market” that was already “chugging along nicely”.

As a result, the datacentre hardware and software market enjoyed record growth rates in 2024, with total sales in excess of $280bn, which he described as unprecedented.

“While the ongoing success of public cloud has been the main driving force behind datacentre investments for well over a decade now, no one imagined a 2024 market for datacentre gear reaching over $280bn,” said Dinsdale.

These figures are based on actual sales data from the first three quarters of 2024, combined with Synergy’s own fourth quarter forecast data for the datacentre hardware and software market.

The Synergy data shows that sales of datacentre kit to public cloud providers were up 50% in 2024, while the amount of spend attributed to enterprises was also up 21% year-on-year. “In recent years, growth in the enterprise sector has been rather anaemic, [and] for over 10 years now, cloud providers have increasingly driven the market for datacentre gear – and Synergy’s five-year forecast shows there will be no letup in this trend,” said Dinsdale.

Public cloud providers now account for more than half of the spend (55%) in the datacentre hardware and software market, Dinsdale continued, up from 20% 10 years ago. “Our forecast shows it reaching almost 65% five years from now,” he added.

Around 85% of the spend in this market is generated by the sale of servers, storage and networking kit, confirmed Synergy, while the remaining 15% comes from sales of cloud management, security and virtualisation software.

One notable trend, called out by Synergy, is how prominently Nvidia now features among the roll-call of datacentre hardware providers, thanks in no small part to the fact its GPU technology is being sold directly to both hyperscalers and enterprises.

“Excluding original design manufacturers, Dell is the overall leader in the server and storage segment, with Inspur being a clear leader in server sales to public cloud providers,” said Synergy, in its research note.

“Cisco is the leader in the networking segment, while Microsoft features prominently in the rankings due to its position in server operating systems and virtualisation applications. Nvidia now features heavily as a supplier both to other system vendors and directly to service providers.”

Source

Posted on

AMD CES 2025 Keynote live blog: as it happened

Refresh

2025-01-06T18:40:08.666Z

Good morning folks. We’re queueing up outside the South Seas Ballroom at Mandalay Bay, awaiting the start of AMD’s CES 2025 keynote, and it’s sure to be a packed 45 minutes to an hour. I’ll be here bringing you all the latest news as it breaks, as well as my thoughts on what’s being announced.

I’ll keep you updated once I’m in my seat, so stay tuned!

2025-01-06T18:58:45.541Z

The stage at AMD's CES 2025 press conference

(Image credit: Future / John Loeffler)

We’re five minutes away from the start of AMD’s press conference, so it’s time to settle in.

2025-01-06T19:03:53.280Z

AMD Senior VP Jack Huynh is taking the stage now, No Lisa Su this time.

2025-01-06T19:06:21.698Z

The AMD Ryzen 9 9950X3D and 9900X3D are up first.

2025-01-06T19:09:16.673Z

Slides from the AMD CES 2025 keynote

(Image credit: Future / John Loeffler)

Not to brag or anything…

2025-01-06T19:11:58.810Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

Ryzen 9 9950X3D and 9900X3D coming in March 2025.

2025-01-06T19:13:32.508Z

AMD Ryzen 9 9955HX3D coming to laptops, along with a pair of non-X3D HX chips (I missed the model names of the other two, I’ll grab those in a sec).

2025-01-06T19:15:08.368Z

An AMD executive presenting at CES 2025

(Image credit: Future / John Loeffler)

AMD’s SVP of Client Business Rahul Tikoo is on stage now to talk about AI PCs.

New Ryzen AI 300 chips, targeting the midrange user with Ryzen AI 7 350 and Ryzen 5 340.

2025-01-06T19:25:28.725Z

Image 1 of 4

A slide showing the new AMD Ryzen AI Max skus(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)Slides showing Ryzen AI Max benchmarks at CES 2025(Image credit: Future / John Loeffler)

Now we’re moving on to the new Ryzen AI Max series, which are workstation CPUs with up to 40 RDNA 3.5 compute units, which is a hell of a lot for an integrated GPU. Up to 50 TOPS XDNA 2 NPU, and up to 256GB/s memory bandwidth.

2025-01-06T19:27:19.802Z

Ok, so we’re on to enterprise products, namely AMD Epyc and AMD Instinct data center CPU and GPUs.

2025-01-06T19:28:02.748Z

We’ve also got some discussion of AMD Ryzen AI 300 Pro.

2025-01-06T19:30:49.763Z

I have no idea what TCO means, but Shell says AMD Ryzen CPUs offer the best, so there’s that.

2025-01-06T19:32:27.820Z

Now PC manufacturer executives are singing AMD’s praises, including HP, Lenovo, and Asus.

2025-01-06T19:38:46.234Z

An AMD and Dell Executive talking about the new Dell Pro portfolio at CES 2025

(Image credit: Future / John Loeffler)

So Dell is now on stage with AMD talking about the first Dell professional PCs and laptops to feature AMD chips. Oh, and Dell is completely rebranding its entire product portfolio, but that’s for another news story.

2025-01-06T19:41:09.567Z

Everyone keeps talking about the ‘AI revolution’, but honestly, I’ve yet to see anything from AI PCs so far that is truly revolutionary. I’m sure its coming at some point in the future, but the future isn’t here just yet.

2025-01-06T19:47:13.070Z

OK, so the press conference has wrapped, and there was no discussion of AMD Radeon graphics cards, as we were expecting, but we know they’re coming so there might be more to come on that over the next few days.

For now, though, the big news is the new Ryzen 9 9950X3D and Ryzen 9 9900X3D chips due out in March, as well as new high-performance mobile ships for both enthusiasts, gamers, and enterprise users.

There’ll be more from me today, but for now, we have to clear out of the ballroom, so stay tuned for more from us here at CES 2025.

Source

Posted on

CMA gives Vodafone-Three merger green light

The UK’s Competitions and Markets Authority (CMA) has cleared the Vodafone-Three merger, subject to legally binding commitments. It’s expected to formally complete in the first half of 2025.

The CMA had previously warned that the proposed merger of Vodafone and Three would likely lead to higher prices and reduced service. The deal is subject to Vodafone-Three delivering a joint network plan, which sets out the network upgrade, integration and improvements the two companies will make to their combined network across the UK over the next eight years.

Vodafone and Three will also need to cap selected mobile tariffs and data plans for three years, which the CMA said would directly protect large numbers of Vodafone-Three customers from short-term price rises in the early years of the network plan. The merged company will also be required to offer pre-set prices and contract terms for wholesale services for three years, to ensure that virtual network providers can obtain competitive terms and conditions as the network plan is rolled out.

The merger of Vodafone and Three is regarded as Vodafone’s response to BT’s 2016 purchase of EE, and the 2021 merger of Virgin Media and O2 to form VMO2.

Margherita Della Valle, Vodafone Group’s CEO, described the combination as being “great for customers, great for competition and great for the country”.

The two companies have committed to investing £11bn to create what they claim is one of Europe’s most advanced 5G networks. The aim is to reach 99% of the population and benefit over 50 million customers. The investment in mobile networking promises better quality, greater reliability and enhanced capacity for handling ever-increasing data demand, according to Vodafone and Three, who see demand for mobile data servers increasing with more widespread adoption of new technology, such as artificial intelligence (AI).

“The CMA’s decision is not a surprise – it has signalled for some time that it was receptive to approving the merger subject to appropriate concessions from the parties,” said Alex Haffner, a competition partner at Fladgate. “Nevertheless, it is noteworthy in that it has permitted a ‘4-3’ merger in the mobile sector on the basis of purely behavioural remedies – over the past decade, a multitude of ‘4-3’ mobile network mergers across Europe have been permitted only on the basis of significant structural remedies being conceded by the merging parties. In doing so, the CMA has displayed a degree of pragmatism, sensing that consumers will ultimately benefit more from competition between three well-resourced mobile operators in the UK market.”

Kester Mann, director of consumer and connectivity at CCS Insight, described the deal as “one of the most significant moments in the history of UK mobile”, heralding the arrival of a new market leader with a combined 29 million customers.

“The CMA’s decision to approve the merger is the right one, and largely strikes a good balance between nurturing competition and encouraging investment,” he said. “It should pave the way for more efficient investments to bring about much-needed improvements to mobile services in the UK.”

However, as Matthew Howett, founder and CEO at Assembly Research, noted, there is still a chance Sky may seek to challenge the decision. He nonetheless said a successful appeal to the CMA’s decision would be hard-fought, expensive and face a high bar. “We expect positive implications overall, not only for investment in, and the quality of, networks (including standalone 5G), but also for the wholesale customers, consumers and businesses that rely on them,” he said.

For Howett, telco regulator Ofcom has a significant new role focused on the oversight of the Vodafone-Three merger. “The regulator seems emboldened to assume these responsibilities,” he said. “Its monitoring will need to be carried out in an agile a way as possible to ensure the merged entity is living up to expectations, and to minimise any risk of circumvention or market distortions that some have warned about.”

Source

Posted on

Data bill aims to boost police and NHS productivity

Thank you for joining!

Access your Pro+ Content below.

5 November 2024

Data bill aims to boost police and NHS productivity

  • Share this item with your network:

In this week’s Computer Weekly, the government’s new data bill promises to improve productivity and efficiency for the NHS and police, but will it ensure privacy as well? We talk to Dell’s global CTO about how the IT giant sees the AI boom playing out. And we examine which industries stand to benefit most from the collaboration opportunities of virtual reality. Read the issue now.

Source