US analyst TD Cowen published a research note on 26 March 2025 that suggested the public cloud giant had cancelled and deferred datacentre lease agreements in the US and Europe that would have increased its compute capacity by at least 2GW.
The reason for the rollback on its plans was, according to TD Cowen, due to Microsoft’s decision not to support OpenAI’s incremental training workloads.
TD Cowen had previously said the two companies were involved in a “fraying relationship”, after Microsoft confirmed in January 2025 that the exclusivity cloud hosting deal between the two firms had been rejigged.
A Microsoft blog post, dated 21 January 2025, confirmed OpenAI had made a “large Azure commitment” that included “changes to the exclusivity on new capacity, moving to a model where Microsoft has a right of first refusal”.
This means Microsoft gets first refusal on whether or not it wants to host OpenAI workloads, but OpenAI also reserves the right to build its own capacity with other partners if Microsoft cannot meet its needs.
Microsoft has now issued a statement to Computer Weekly, pushing back on TD Cowen’s take on the situation, while also restating the strength of the working relationship between the company and OpenAI.
In reference to its decision to scale back its datacentre expansion plans, Microsoft said it’s “well-positioned” to meet the current and increasing customer demand it’s seeing for its services thanks to the “significant investments” it’s made in its infrastructure to this point.
“Last year alone, we added more capacity than any prior year in history,” said a Microsoft spokesperson. “While we may strategically pace or adjust our infrastructure in some areas, we will continue to grow strongly in all regions.
“This allows us to invest and allocate resources to growth areas for our future. Our plans to spend over $80bn on infrastructure this financial year remain on track as we continue to grow at a record pace to meet customer demand.”
Microsoft has been a partner in OpenAI since 2019, with the two firms previously stating that they were working towards a shared goal to “responsibly advance artificial intelligence research” while democratising the technology and making it accessible to all.
Around the same time that Microsoft released details of its reworked cloud hosting arrangement with OpenAI, the latter released details of its $500bn effort to expand the infrastructure underpinning its services through the launch of the Stargate Project.
Softbank, Oracle, MGX and OpenAI are the equity funders for the initiative, while Microsoft is listed as a technology partner.
In reference to its ongoing partnership with OpenAI, the Microsoft spokesperson said: “OpenAI continues to be a great partner. We remain committed to pushing the frontier of AI forward, driving innovation, and making cutting-edge models accessible to our customers and partners.”
Until now, IT leaders have needed to consider the cyber security risks posed by allowing users to access large language models (LLMs) like ChatGPT directly via the cloud. The alternative has been to use open source LLMs that can be hosted on-premise or accessed via a private cloud.
The artificial intelligence (AI) model needs to run in-memory and, when using graphics processing units (GPUs) for AI acceleration, this means IT leaders need to consider the costs associated with purchasing banks of GPUs to build up enough memory to hold the entire model.
Nvidia’s high-end AI acceleration GPU, the H100, is configured with 80Gbytes of random-access memory (RAM), and its specification shows it’s rated at 350w in terms of energy use.
China’s DeepSeek has been able to demonstrate that its R1 LLM can rival US artificial intelligence without the need to resort to the latest GPU hardware. It does, however, benefit from GPU-based AI acceleration.
Nevertheless, deploying a private version of DeepSeek still requires significant hardware investment. To run the entire DeepSeek-R1 model, which has 671 billion parameters in-memory, requires 768Gbytes of memory. With Nvidia H100 GPUs, which are configured with 80GBytes of video memory card each, 10 would be required to ensure the entire DeepSeek-R1 model can run in-memory.
IT leaders may well be able to negotiate volume discounts, but the cost of just the AI acceleration hardware to run DeepSeek is around $250,000.
Less powerful GPUs can be used, which may help to reduce this figure. But given current GPU prices, a server capable of running the complete 670 billion-parameter DeepSeek-R1 model in-memory is going to cost over $100,000.
The server could be run on public cloud infrastructure. Azure, for instance, offers access to the Nvidia H100 with 900 GBytes of memory for $27.167 per hour, which, on paper, should easily be able to run the 671 billion-parameter DeepSeek-R1 model entirely in-memory.
If this model is used every working day, and assuming a 35-hour week and four weeks a year of holidays and downtime, the annual Azure bill would be almost $46,000 a year. Again, this figure could be reduced significantly to $16.63 per hour ($23,000) per year if there is a three-year commitment.
Less powerful GPUs will clearly cost less, but it’s the memory costs that make these prohibitive. For instance, looking at current Google Cloud pricing, the Nvidia T4 GPU is priced at $0.35 per GPU per hour, and is available with up to four GPUs, giving a total of 64 Gbytes of memory for $1.40 per hour, and 12 would be needed to fit the DeepSeek-R1 671 billion-parameter model entirely-in memory, which works out at $16.80 per hour. With a three-year commitment, this figure comes down to $7.68, which works out at just under $13,000 per year.
A cheaper approach
IT leaders can reduce costs further by avoiding expensive GPUs altogether and relying entirely on general-purpose central processing units (CPUs). This setup is really only suitable when DeepSeek-R1 is used purely for AI inference.
A recent tweet from Matthew Carrigan, machine learning engineer at Hugging Face, suggests such a system could be built using two AMD Epyc server processors and 768 Gbytes of fast memory. The system he presented in a series of tweets could be put together for about $6,000.
Responding to comments on the setup, Carrigan said he is able to achieve a processing rate of six to eight tokens per second, depending on the specific processor and memory speed that is installed. It also depends on the length of the natural language query, but his tweet includes a video showing near-real-time querying of DeepSeek-R1 on the hardware he built based on the dual AMD Epyc setup and 768Gbytes of memory.
Carrigan acknowledges that GPUs will win on speed, but they are expensive. In his series of tweets, he points out that the amount of memory installed has a direct impact on performance. This is due to the way DeepSeek “remembers” previous queries to get to answers quicker. The technique is called Key-Value (KV) caching.
“In testing with longer contexts, the KV cache is actually bigger than I realised,” he said, and suggested that the hardware configuration would require 1TBytes of memory instead of 76Gbytes, when huge volumes of text or context is pasted into the DeepSeek-R1 query prompt.
Buying a prebuilt Dell, HPE or Lenovo server to do something similar is likely to be considerably more expensive, depending on the processor and memory configurations specified.
A different way to address memory costs
Among the approaches that can be taken to reduce memory costs is using multiple tiers of memory controlled by a custom chip. This is what California startup SambaNova has done using its SN40L Reconfigurable Dataflow Unit (RDU) and a proprietary dataflow architecture for three-tier memory.
“DeepSeek-R1 is one of the most advanced frontier AI models available, but its full potential has been limited by the inefficiency of GPUs,” said Rodrigo Liang, CEO of SambaNova.
The company, which was founded in 2017 by a group of ex-Sun/Oracle engineers and has an ongoing collaboration with Stanford University’s electrical engineering department, claims the RDU chip collapses the hardware requirements to run DeepSeek-R1 efficiently from 40 racks down to one rack configured with 16 RDUs.
Earlier this month at the Leap 2025 conference in Riyadh, SambaNova signed a deal to introduce Saudi Arabia’s first sovereign LLM-as-a-service cloud platform. Saud AlSheraihi, vice-president of digital solutions at Saudi Telecom Company, said: “This collaboration with SambaNova marks a significant milestone in our journey to empower Saudi enterprises with sovereign AI capabilities. By offering a secure and scalable inferencing-as-a-service platform, we are enabling organisations to unlock the full potential of their data while maintaining complete control.”
This deal with the Saudi Arabian telco provider illustrates how governments need to consider all options when building out sovereign AI capacity. DeepSeek demonstrated that there are alternative approaches that can be just as effective as the tried and tested method of deploying immense and costly arrays of GPUs.
And while it does indeed run better, when GPU-accelerated AI hardware is present, what SambaNova is claiming is that there is also an alternative way to achieve the same performance for running models like DeepSeek-R1 on-premise, in-memory, without the costs of having to acquire GPUs fitted with the memory the model needs.
Broadcom’s 2023 acquisition of VMware for US$69bn led to disruptive changes in the virtualisation provider’s pricing.
Key here is a move from perpetual licences to a subscription model. This has left some enterprises facing higher costs, with some considering a move to alternative virtualisation environments.
For those considering that, the challenge is to ensure any migration provides adequate backup and recovery measures for new hypervisors. This is as well as protecting remaining VMware workloads.
VMware: Twist or stick?
The main reason CIOs cite for moving away from VMware is cost, with worries over increasing overheads from the new subscription model prominent. VMware also discontinued its free edition of VMware vSphere ESXi, which was popular with smaller firms.
For enterprises looking to move, VMware alternatives include competing virtualisation technologies, such as Nutanix, Microsoft Hyper-V and Oracle Linux Virtualization. There are also open source options that include Red Hat OpenShift Virtualization, Linux Kernel-level Virtual Machines (KVM) and Proxmox Virtual Environment.
As yet, there are few signs of a mass exodus, however. One survey, carried out by backup provider Nakivo, suggested a third of its customers planned to move away from VMware to Proxmox. The supplier points to a smaller number of customers moving to Nutanix and Hyper-V.
This suggests a larger percentage of VMware users have either decided to stay with the technology and the new commercial terms, some of which – including simpler storage licensing – can favour some workloads.
“Naturally, the first reaction is to say, ‘Right, I’m going to go somewhere else, I’m going to use somebody else’s technology’,” says Patrick Smith, field chief technology officer for EMEA at Pure Storage.
“And some organisations have fairly rapidly moved off VMware onto other platforms, but they are either small or very agile to be able to do that.”
Other enterprises might be biding their time, not least because moving between hypervisor platforms is complex and carries risk. Nor do the alternatives offer all VMware’s features and functionality – or not in one place, at least.
Backup, recovery and VMware alternatives
If moving workloads from one hypervisor to another is difficult, then ensuring those workloads and data are backed up adds another layer of complexity.
Much will depend on how an enterprise currently protects its systems, including VMware, alternative hypervisors it is considering, and the backup and recovery tools it uses.
For the majority of organisations, it is probable the data protection systems they use will work if they choose to stay with VMware as a major platform or migrate to alternatives Tony Lock, Freeform Dynamics
The good news is the larger backup and disaster recovery suppliers already have support for competing virtualisation platforms. Hyper-V, in particular, is well supported for businesses that also run on Microsoft infrastructure.
At the same time, providers such as Veeam, Rubrik and Nakivo have strengthened support for open source platforms, especially Proxmox.
This raises the prospect of firms being able to continue with their current backup and recovery provider, even if they move to a mixed approach to virtualisation. Alternatively, if their current disaster recovery supplier falls short, there is the chance to move to a toolset that does support a multi-supplier approach.
“For the majority of organisations, it is probable the data protection systems they use will work if they choose to stay with VMware as a major platform or migrate to alternatives,” suggests Tony Lock, principal analyst at Freeform Dynamics. “This is especially likely to be the case if they have a data protection solution that protects a mixed environment.”
Out of the box?
However, even if a data protection or backup and recovery tool supports alternatives to VMware, IT teams should anticipate carrying out configuration and testing before their alternatives go live.
If they do not, there is a risk that by attempting to save money on licensing, they expose the business to risk and additional costs down the line.
Backup is turning out to be a quite a polarising aspect of moving away from VMware Bruce Kornfeld, StorMagic
VMware’s maturity and market share means products such as ESXi and vSAN are well-understood and well-supported by independent software suppliers, integrators and in-house teams. Not all hypervisors enjoy that industry support.
One area where this is apparent is where backup and recovery providers offer “agentless” integration directly with hypervisors. This is not – yet – on offer for all the alternatives, and CIOs might need to consider agent-based backup.
“Backup is turning out to be a quite a polarising aspect of moving away from VMware,” says Bruce Kornfeld, chief product officer at StorMagic, a supplier of hyper-converged storage.
“The leaders in virtualisation have had the attention of the backup software industry over the last 20-plus years, and tight agentless integration directly with their hypervisors is something that many users have come to expect. However, the backup software industry hasn’t had the research and development capacity to work with every hypervisor on the market – there just hasn’t been the return on investment in the past.”
“VMware customers that have made the decision to move away from VMware need to re-address their backup strategy,” he says. “They need to look at using an agent-based approach. This is the way backup has been done for decades and will work with any hypervisor.” This should not, Kornfeld says, come with extra costs.
Firms also need to consider the time and resources they need to set aside for backup and disaster recovery testing, once they have decided to move workloads away from VMware. This includes testing file and virtual machine-based backup routines.
In fact, changing hypervisors can present a good opportunity to review the strength of disaster recovery and backup arrangements across the business. These might not be as robust as CIOs expect.
“It is fair to say that some organisations are not totally happy with their data protection solutions and processes,” says Tony Lock.
“In such circumstances, it is certainly something they will need to look at, but the issue is do they have the resources and budgets to potentially modify two important systems at once? And even if they do, would they be happy that they can manage the risk of change, since any major platform change carries some element of risk?”
It is here where careful supplier evaluation and selection, and potentially bringing in additional supplier or third-party engineering support, should pay for itself.
The past 12 months saw flash storage nudge into areas from which it had hitherto been absent. In particular, this was because of the availability of denser – and therefore cheaper per-gigabyte (GB) – quad-level cell (QLC) flash storage into array markets and use cases that were once considered nearline.
Alongside this, we saw the price-per-GB of flash drop towards the level of spinning disk hard disk drives (HDDs) then rebound rapidly as memory manufacturers chased profitability. Meanwhile, the keenest of flash storage advocates predicted the demise of the hard drive and the imminent victory of the all-flash datacentre.
In this article, we define enterprise flash storage, look into its QLC and triple-level cell (TLC) variants, the benefits of non-volatile memory express (NVMe) flash, and examine the pros and cons of flash versus HDD in terms of cost, performance, flash in the cloud, and the likelihood (or otherwise) of the all-flash datacentre.
What is enterprise flash storage?
Enterprise flash storage refers to systems that comprise multiple flash drives housed in datacentre rack-mounted array form factor products.
In enterprise flash storage arrays, the capacity of many drives is aggregated, with access to storage media governed by controller hardware.
The controller is compute that powers the intelligence needed to handle input/output (I/O) from hosts to the storage, decision-making over allocation of data to media, but also in flash arrays to carry out maintenance tasks such as wear levelling, garbage collection, and so on.
Enterprise flash storage array capacities run from tens of terabytes (TB) to many petabytes (PB). As with HDD-based arrays, access to storage can be block (for performance-hungry database use cases, for example), file (for general use and unstructured data) or object (for unstructured data also).
What is QLC flash storage?
QLC is the latest generation of flash storage media. QLC stands for quad-level cell. That means that every cell in the flash chip can store four bits of data using 16 states.
That means it can store more data in the same space than TLC flash, which is also widely available. Previously widely available were single-level cell (SLC) flash and multi-level cell (MLC, meaning two states), but these have been largely superseded now.
At the start of 2024, most enterprise storage arrays are built with TLC drives for general-purpose and mission-critical use cases. But QLC has edged into the mainstream and gained traction for unstructured data workloads, in particular with key enterprise storage array makers adding QLC-based products in the past year or so.
As manufacturers increase the number of possible states per cell, storage density increases and the cost of storage per GB decreases. But, as storage density increases in terms of cell capacity, issues can arise that can limit the endurance of flash media.
But NVMe is at the forefront now for flash drive performance. NVMe’s key innovation was to optimise queues and buffers for use with flash, which improved performance many times over.
As a follow-on, suppliers then developed ways of allowing NVMe connectivity across physically more distant connections across the datacentre. Such NVMe-over-fabrics technologies include the ability to carry NVMe via Ethernet, Infiniband, TCP, RDMA (ie, memory-to-memory connectivity) and more.
What is HDD?
Hard disk drives (HDDs) that rely on magnetic read/write heads and mechanically spinning disks have been around for decades, with flash a competitor that has emerged in the past 10 years or so.
As with flash, HDDs can be aggregated into datacentre rack-mounted array products and the capacity of multiple drives pooled for enterprise users. In fact, HDD-based arrays long preceded enterprise flash arrays and are still widely used.
What’s the difference in performance between flash and HDD?
When we look at flash versus disk, the key thing that stands out is that flash is fast – many times faster than spinning disk HDD.
Flash drives offer lower latency, with access times down to low milliseconds, or even microseconds, compared with the multiple milliseconds of spinning disk, particularly for reads. That means enterprise flash can also offer vastly more input/output operations per second (IOPS) when aggregated into a storage array.
In throughput terms, flash offers gigabit-per-second (Gbps) rates four or five times quicker than HDD.
Such rapidity has been the key draw for enterprise flash storage and is a result of the lack of moving parts. With spinning platters, HDD is limited by physics in ways that solid-state storage is not.
In terms of capacities, HDD is available in up to around 22TB units. And while some flash drives have been marketed that run to 60-plus terabytes, they generally come in smaller sizes, but part of that is because of cost.
What’s the cost difference between flash and HDD?
In terms of per-GB cost at drive level, flash costs more than spinning disk.
Flash prices spiked significantly in late 2023 and the early months of 2024 as manufacturers throttled back production in an effort to raise prices and achieve profitability.
Solid-state drive (SSD) prices per gigabyte reached an average of $0.095/GB by April 2024, which was a rise of 26.67% since autumn 2023.
But, flash drive prices then fell steadily over the first three quarters of 2024 to an average of $0.085 per gigabyte (GB) in September 2024.
In October 2023, flash had averaged $0.075/GB while HDD averaged $0.05/GB for SAS and $0.035/GB for SATA drives.
Average spinning disk (SAS and SATA) hard drive prices held steady during the six months to September 2024 at $0.039 per gigabyte. That figure was $0.041/GB in early April.
For a customer that planned to deploy 20TB of flash, based on those prices, it would have cost $1,500 in October 2023, $1,900 in April 2024, and $1,700 in September 2024. That compares to the equivalent for spinning disk of $850 in October 2023 and $780 in September 2024.
Will flash kill HDD? How much longer for HDD?
In particular, Pure Storage has declared HDDs will be dead by 2028, with its flash products the chief agent in the cull, and all owing to its ability to aggregate much more flash capacity on its proprietary modules than occurs on commodity flash drives.
With flash module sizes of up to 300TB by 2026 promised by Pure, it contends that spinning disk will be commercially unviable.
Meanwhile, companies such as Panasas, which specialises in storage for unstructured data, point to hyperscaler datacentres’ overwhelming use of spinning disk in ratios up to 90/10 against flash. Panasas argues that there’s still a five-times differential between the lowest-cost flash and HDD, and that for most, something like the hyperscaler solution is optimal.
When can you use flash and HDD in the cloud?
Enterprise users can also specify flash storage and spinning disk in the cloud. It is more likely in most cases that cloud storage will be specified by performance and cost criteria, in which case the customer may never know what media underlies it.
But it is possible also to specify flash storage in the cloud and the three largest hyperscalers – Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) – have solid-state storage options that mix cost, capacity and performance.
The hyperscalers all offer flash storage to support compute with service levels based on capacity and IOPS per volume that range from general-purpose to premium levels aimed at specific workloads (eg, SQL, Oracle, SAP Hana) and environments (eg, Windows, Lustre, MacOS).
There are also options aimed at flash for file storage and flash storage from named suppliers, such as Azure’s NetApp Files.
What is the all-flash datacentre?
For about a decade, the idea of the all-flash datacentre has been discussed. The all-flash datacentre replaces HDD and other media such as tape with flash storage.
Driving it is the continued decrease in the cost of flash storage – as with QLC flash – but also the advantages of flash in terms of rapid access. The latter becomes more relevant as customers want to run analytics on bigger subsets of their data.
So, for example, where backups may previously have been held on nearline media such as slower HDDs, advocates of flash for such use cases point to the ability to run artificial intelligence (AI) on large customer datasets and to gain value therefrom.
Also, with backups as an example, the idea of being able to recover quickly from flash media in case of a ransomware attack is another use case touted by all-flash datacentre boosters.
When will the all-flash datacentre arrive?
While enthusiastic suppliers of flash storage such as Pure talk down the obstacles to the all-flash datacentre, analysts point to the spread of (especially QLC) flash into secondary workloads but not necessarily all use cases, with spinning disk likely to retain its usefulness for some time for some datasets.
Meanwhile, HDD suppliers such as Toshiba say around 85% of all data is still on spinning disk. That fact, it says, is not likely to change rapidly, not least because the flash capacity to replace it doesn’t exist.