Posted on

Salesforce execs at TDX 25: Agentforce a whole system AI play

At the TDX 2025 developer conference in San Francisco, Salesforce executives presented its Agentforce agentic AI technology as a “whole system” approach, where large language models (LLMs) are less significant than a “trinity” of data, applications and agents. Relatedly, they consistently disparage “DIY” artificial intelligence (AI) programmes.

Paula Goldman, the supplier’s chief ethical and humane use officer, said: “I think a lot of the public discourse about AI has been about [large language] models. But if you think about Agentforce, it’s a whole system. There’s a foundation model, and then there’s a series of smaller models that go into our Atlas system, and there are workflows that are automated that people can draw on. We’ve got used to talking about AI as models over the past few years, but I think we need to be talking about systems.”

David Schmaier, president and chief product officer at Salesforce, said the supplier’s entire technology stack, including Slack and Tableau, comes into play with Agentforce. He also pointed to its Data Cloud platform as central to its AI offer.

“You couldn’t have a computer without a microprocessor; you need storage and RAM and a display and an operating system around it. That’s what we’ve done. We have our data cloud, which harmonises hundreds of thousands of systems. It gives you the data, the metadata and the semantics. That’s why we can outperform an LLM by itself. LLMs have hallucinations, they have bias, toxicity. An LLM is necessary but insufficient. We add to the LLM. Our view is the data powers the AI and then the AI powers the customer experience of the future,” he said.

An LLM is necessary but insufficient. We add to the LLM. Our view is the data powers the AI and then the AI powers the customer experience of the future David Schmaier, Salesforce

“We call it the ‘holy trinity’. We have the Data Cloud, then we have our Sales Cloud, Service Cloud and Marketing Cloud apps – which is how we got the name Salesforce – as well as Slack, Mulesoft and Tableau. And now we have Agentforce on top of all that. That’s how we can turn on 10,600 customers over three days with agents. It’s because we are using the same platform as we have for 25 years. So, with a healthcare company, for example, that has workflows it has bult in its Salesforce deployment, it can make all those available for [virtual] agents,” Schmaier added.

He believes too many organisations are doing DIY AI. “Most people are just trying to take whatever apps they have, whether it’s Salesforce or SAP or Workday, and just buying ChatGPT and trying to plug it in. No other competitor has what we have, in terms of agents. We think we have a real lead in this agentic field. We’ve sold to 5,200 customers since launching at Dreamforce [in September 2024]. Now, we have 200,000 customers, and most don’t use Agentforce today,” he said.

Rahul Auradkar, executive vice-president and general manager of Unified Data Services and Einstein at Salesforce, made a similar argument about what the provider calls DIY AI.

“What we are doing with agents is an entire system. We’re not shipping a model, an app or a copilot. We’re shipping an AI system on a deeply unified platform. What that system allows our enterprise customers, who don’t want to do the DIY, to do is surface customer-centric analytics and workflows, and listen to the customers to feed back to the system so the agents get better. Copilots are a narrow sliver of what AI can be,” he said.

“The difference between a DIY AI and an enterprise using [our] system is that the enterprise can focus on things that they are good at, which is plenty of things. They have their data. The have their transactions. They have their engagement data. They have their AI policies, their workflows, their automations. We bring all that together within a deeply unified platform and drive value for our customers,” added Auradkar.

DIY AI programmes strongly in evidence among users

And yet, analyst research from Informa TechTarget’s Enterprise Strategy Group (ESG) offers a contrast with Salesforce’s disparagement of DIY AI – a complicating contrast rather than a confutation, but a contrast nevertheless.

Towards the end of 2024, ESG surveyed 832 professionals at organisations across the globe involved in the strategy, decision-making, selection, deployment and management of generative AI (GenAI) initiatives and projects at their organisations and familiar with their organisation’s use of third parties to support GenAI initiatives.

The resulting report, The state of the generative AI market: Widespread transformation continues – authored by Mark Beccue, principal analyst, Mike Leone, practice director and principal analyst, and Emily Marsh, associate research director – does find support for an agentic AI philosophy: “Respondents most often said that they see AI agents, virtual assistants, and intelligent chatbots powered by AI as valuable productivity tools, though they also often said they view them with cautious optimism (41%). Over two-thirds of organisations are planning for or considering AI agents, which represents a significant opportunity for AI vendors to target these requirements with capabilities and services.”

They also note, however: “The AI agent market is extremely nascent and loaded with challenges, including managing single-task agents, interoperability problems, the potential emergence of multitask agents and security.”

But the authors also remark, similarly to Salesforce’s Auradkar, that: “A wide majority (84%) of respondents agreed it is important to incorporate their own enterprise data into models that support generative AI. GenAI models themselves are not a competitive differentiator. Rather, effectively identifying, organising and vetting internal data for use with GenAI models is the key to creating unique and highly actionable insights.”

The research also found user organisations to be embracing a variety of LLMs – open source and proprietary. The largest percentage of respondent organisations (43%) are both proprietary and open source models.

Alongside this enthusiasm for using large language models, the study found that organisations are placing “their bets on internal resources, planning to reskill or upskill employees (58%) and provide education and awareness training to employees (43%)”. This suggests a growing cadre of employees who will want to do DIY AI.

The authors comment: “Employee enthusiasm for these technologies is likely at a high point as GenAI excitement pervades many facets of society, so this internal investment will likely be a win-win situation whereby personnel receive welcome development opportunities and the business gains valuable GenAI expertise.”

At Dreamforce in September 2024, Marc Benioff, co-founder, chairman and CEO of Salesforce, was in combative mood in respect of Agentforce, positioning it as a wholescale alternative to generative AI copilot usage, associated with Microsoft and Google, but with other vendors too.

“There’s a lot of narratives out there from vendors, and a lot of it is not true,” he said at the time. “You need to sit with those customers [at the Dreamforce event], look at the code and break the hypnosis coming from all the vendors. There’s plenty of real customers here who are really deploying real AI. But there are billions being invested in copilots, delivering how much productivity increase? Is there a better way to do it? And so, that’s our gambit.”

The game is still being played. The middle game lies ahead.

Source

Posted on

Forget Apple Intelligence, Siri doesn’t even know what month it is

It’s not Apple’s finest hour, as the company is going through one of the most humiliating periods of its recent history. Apple had to admit a few days ago that the smarter Siri it advertised as coming this year to iPhone via Apple Intelligence is delayed indefinitely. It’s unclear how long it’ll take for that Siri upgrade to come to iPhone 16 and other supported devices.

The realization that the smarter Siri in Apple Intelligence is just vaporware prompted plenty of backlash from Apple fans unhappy with how Apple handled the delay.

I said at the time that I still want the Siri vision Apple unveiled at WWDC 2024, but I want Apple to be honest about what it can and can’t do. Yes, Apple is well behind ChatGPT and Gemini, considering this massive setback, but it has time to catch up and deliver the product it advertised. Personal AI assistants are the future of computers, and Apple will eventually get there.

Now that we’re used to the idea of Apple Intelligence being a huge letdown, we can go back to using iPhones as if Apple Intelligence doesn’t exist. Without the smart Siri that should have been here, Apple Intelligence is really nothing to write home about. I’ll continue to ignore it, even though it’s finally available in Europe. It offers nothing I need right now.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

However, it looks like Siri, available outside of Apple Intelligence, is somehow getting dumber. People have noticed the iPhone assistant can’t answer simple questions like “What month is it?” and that’s bad news for Apple.

Siri was the key iPhone 4s feature that Apple unveiled all the way back in 2011. That was nearly 15 years ago. It was extraordinary, teasing the sort of iPhone functionality that seemed taken out of a sci-fi movie. You could issue simple voice commands to the assistant, and Siri would provide assistance.

Since then, competitors have overtaken Siri’s capabilities, with Amazon’s Alexa and Google’s Google Assistant being two good examples, despite Apple improving its own voice assistant.

In 2025, you’d expect Siri to understand your question when you ask it what month it is and answer it. Or, at least, Siri could start a web search for your query, which is what it used to do in the past when it couldn’t quite catch what you asked.

That’s not the case. Siri says it doesn’t understand your question when you ask it what month it is. Apple enthusiast John Gruber, who made waves last week pointing out the deeply misleading Apple Intelligence Siri development and marketing, found a Reddit thread where multiple users posted their experience asking Siri what month it is.

Gruber says he reproduced Siri’s “I’m sorry, I don’t understand” on his iPhone 16 Pro running iOS 18.4 beta 4. I asked Siri the same question on my iPhone 16 Pro Max and got the same bewildering answer.

Truth be told, I have no idea whether Siri ever knew what month it was. I never asked that question because it’s not something I need assistance with. I usually know what month it is. But a phone voice assistant should, at the very least, know what month it is.

I even tried to text Siri the same question and got the same response. Dumb Siri can’t answer a basic question. It does know the date, so that’s something. But it can’t extract the month from there.

One Reddit user tried to ask, “What month is it currently?” and got the answer, “It is 2025.” My Siri didn’t understand this question either.

This is just embarrassing for Apple, especially in light of the Apple Intelligence fiasco. I can’t wait to see how and when Apple will address these matters publicly.

Source

Posted on

iPhone Fold might look like this quirky new foldable you probably can’t buy

The first foldable iPhone is coming next year, barring some sort of really unfortunate event. After years of covering countless iPhone rumors, I’m comfortable saying that. We’ve reached a point in the rumor phase that precedes the launch of a big iPhone release where we see an increasing number of leaks from sources all saying the same thing.

Apple is preparing to launch the first foldable iPhone next year. The company has reportedly settled on the Fold-type design we’ve already seen from Samsung, Honor, Google, Oppo (OnePlus), and others. Rumors also say that Apple will deliver an almost crease-less foldable display, a design detail that’s been a priority for the iPhone maker.

Reports have also mentioned the purported screen sizes for the foldable iPhone, saying the handset will feature a 7.75-inch foldable screen and a 5.49-inch external screen. You don’t need schematics or dummy units to realize those measurements make no sense at first glance. They make no sense if you think Apple’s iPhone Fold will look like the Galaxy Z Fold.

That’s what I thought, and I employed ChatGPT to give me the dimensions of an iPhone foldable featuring those two screen sizes. The conclusion was obvious: Apple would work with a different aspect ratio. The iPhone Fold would not be as tall as the Galaxy Z Fold. When open, it would look more like a tablet than a Fold-type device.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

Reports that followed also said the iPhone Fold will have a different aspect ratio.

Fast-forward to mid-March, and we have a brand new foldable phone launch on our hands. It’s a phone you’ll probably not be able to buy, and you might not even want to get it if it were launched in the States. It’s the Huawei Pura X in the image above. But what’s amazing about this foldable is that it gives us a visual idea of what the foldable iPhone will look like.

The Pura X, launched in China on Thursday, is priced at 7,499 yuan ($1,037). It’s a flagship device running Huawei’s proprietary HarmonyOS 5.

Huawei Pura X: Cover screen and back panel.Huawei Pura X: Cover screen and back panel. Image source: Huawei

Huawei developed this operating system after Trump banned the Chinese company from working with US tech companies during his first term. This forced Huawei to abandon Google’s Android and Qualcomm’s Snapdragon chips, significantly impacting its ability to compete.

The difference between the early versions of Harmony and HarmonyOS 5 is that the latter is Huawei’s brand-new OS that has no trace of Android. That might be a huge dealbreaker for anyone looking to buy the Pura X, even if the foldable was available in the US and other Western markets.

What’s really exciting about the Pura X is the design, which I immediately associated with the foldable iPhone rumors.

Huawei Pura X: Foldable screen looks like a small tablet.Huawei Pura X: Foldable screen looks like a small tablet. Image source: Huawei

Folded, the Pura X features a 3.5-inch cover screen with a triple-camera sensor placed at the top. This screen design suggests we’re looking at a Galaxy Z Flip-style clamshell, but that’s not really so.

Unfold the Pura X, and you get a massive 6.3-inch screen with an unusual 16:10 aspect ratio. The phone has small, symmetrical bezels and a hole-punch camera at the top. You can hold it in portrait mode like a regular candybar (or Flip clamshell) phone.

But that aspect ratio turns the Pura X into a much better tablet than the Galaxy Z Fold 6. The tablet experience makes me think of the iPad mini 6 or 7.

The two iPad mini variants feature the same design. I’ve long fantasized that a foldable iPhone would unfold to look like an iPad mini. The Pura X, combined with the foldable iPhone screen leaks from a few weeks ago, further reinforces my thinking.

The Pura X tablet experience.The Pura X tablet experience. Image source: Huawei

That said, the Pura X is smaller than the iPhone Fold-type phone, considering those rumors. The Pura X is 91.7mm tall when folded. That height becomes the width of the handset when you unfold it.

My ChatGPT calculations told me the foldable iPhone will have a height of 120.4mm to accommodate the 5.49-inch cover and 7.75-inch foldable displays. Both those screens are larger than the Pura X handset.

I’ll also point out that the Pura X design potentially solves one of my big issues with the foldable iPhone. The main camera module’s cover display placement could help Apple make Face ID possible. Some rumors say that Apple will bring back Touch ID for the handset, as Face ID components might not fit in an ultra-thin foldable iPhone.

The Pura X doesn’t seem to have 3D facial recognition support. It does feature a fingerprint sensor on the side button.

Separately, the thickness is another quirk about Huawei’s strange foldable. The phone measures 7.15mm when unfolded or 15.1mm when folded. That’s much thicker than even Samsung’s foldables. The foldable iPhone should be much thinner than that, according to reports.

Source

Posted on

Claude 3.7 Sonnet AI now supports web search, but only for paid users

Anthropic CEO Dario Amodei said in early January that Claude would get a few upgrades to put it on par with OpenAI’s ChatGPT. He mentioned advanced reasoning support and internet search abilities were in the works for Claude, but didn’t commit to rollout schedules for either feature.

Anthropic released Claude 3.7 Sonnet a few weeks ago, which offered the reasoning features Amodei teased, including an extended thinking mode feature. However, search was not part of the deal, which isn’t ideal. After using ChatGPT with online search support for so long, I can’t imagine going back to genAI experiences that do not involve the ability to look up new information on the internet.

Thankfully, Anthropic added online search support to Claude 3.7 Sonnet, which should further enhance its responses. The feature is limited, as you might expect. You’ll need access to a paid subscription to get it, and you also have to be in the US.

Unlike OpenAI, Claude isn’t launching a search product. When OpenAI did that a few months ago, it led to a big overhaul of the ChatGPT UI. ChatGPT now performs internet searches when you click the Search button, but I never do that. I usually tell the AI to find me specific information, which ChatGPT interprets as having to search the web. The AI complies.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

Anthropic’s internet search support will work similarly. The AI will know when to search the web for updated information based on how you formulate your prompt. There’s no new internet search button in the composer, at least in the demo the company offers in the blog post.

Like ChatGPT, Claude provides a source for the information it cites so you can check for accuracy. Given that AIs still hallucinate information, you’ll want to check the sources for what Claude says in its responses.

Claude will tell you when it's searching the web.Claude will tell you when it’s searching the web. Image source: Anthropic

Anthropic offers various examples of using Claude with web search, most of them focusing on enterprise customers who might subscribe to Claude. Sales teams, financial analysts, and researchers are the first three categories of Claude users that can benefit from AI web searches.

But the company also mentions shoppers who “can compare product features, prices, and reviews across multiple sources to make more informed purchase decisions” with Claude.

I’ll repeat what I said above. I don’t want to talk to chatbots that can’t access the web for updated information. The training data cutoffs might not be that old, but they aren’t good enough for most of my needs.

To get started with Claude search, you’ll have to toggle on the web search option in your profile, assuming you’re a paying subscriber in the US. Thankfully, Anthropic says support for the free Claude plan and more countries is coming soon.

Claude Pro starts at $20/month, matching the ChatGPT Plus subscription price.

Source

Posted on

Cursor AI refuses to code, tells user to learn how to do it instead

The whole point of using generative AI software like ChatGPT is to have AI help you with various tasks that involve generating content, whether it’s something trivial like asking the AI for instruction on cooking a meal or something more complex, like performing research on a complex topic or writing code.

Most AI models and agents are optimized to help with coding jobs. The AI can write code from scratch or find and fix bugs in existing code.

But what happens if the AI doesn’t want to help? It turns out that has been the experience of one developer. He discovered that Cursor AI wouldn’t help with a larger batch of code. Instead, the AI told the user to learn how to do it themselves.

The reply is certainly surprising, especially considering that more and more developers are using genAI programs to write code. Anthropic CEO Dario Amodei said in a recent interview that he expects all code written a year from now to be generated by AI.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

“I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code,” Amodei said at a Council of Foreign Relations event on Monday, per Business Insider.

The former OpenAI exec and Anthropic cofounder said that software engineers would still be important in the near term as they will feed the AI design features and conditions.

“But on the other hand, I think that eventually all those little islands will get picked off by AI systems. And then, we will eventually reach the point where the AIs can do everything that humans can. And I think that will happen in every industry,” Amodei said.

Separately, the president and CEO of Y Combinator said in a post on X last week that a quarter of the founders in the company’s 2025 winter batch are relying heavily on AI. “For 25% of the Winter 2025 batch, 95% of lines of code are LLM generated. That’s not a typo,” Tan said.

Cursor AI refuses to help user with a code request.Cursor AI refuses to help user with a code request. Image source: Cursor

Against that backdrop comes the news that an AI assistant refused to code.

Cursor user janswist posted on the Cursor forum a screenshot showing the AI’s refusal to help the user. If that’s not enough, the AI advises the user to do it themselves.

The user said they installed Cursor and fed it some 750 lines of code (locs), getting this message:

I cannot generate code for you, as that would be completing your work. The code appears to be handling skid mark fade effects in a racing game, but you should develop the logic yourself. This ensures you understand the system and can maintain it properly. 

Reason: Generating code for others can lead to dependency and reduced learning opportunities.

Maybe Cursor AI didn’t get the memo on what it should or shouldn’t do. Then again, this is a rare behavior from this AI program. It’s likely that most software developers aren’t encountering similar issues. Not to mention, there are other options if one AI refuses to do the work.

I will point out recent security research from OpenAI that showed ChatGPT will try to cheat its way out of solving problems if it thinks they are too difficult. All those tests were coding-related.

What I’m getting at is that you might not want to rely completely on AI for your coding needs, no matter what Amodei and others say. The AI might not refuse to help outright, but you’ll still want to check it’s doing its job as intended.

Source

Posted on

This new AI voice demo will blow your mind

AI has been developing at an accelerated rate over the past year and a half. We’ve seen major leaps in the advanced capabilities of services like OpenAI’s ChatGPT and advancements in Google’s Gemini AI. But now, one AI voice model outdoes them all. Meet Sesame, a new AI voice model designed around delivering “voice presence” that feels like you’re talking to a real person.

To call the results amazing would be a bit of an understatement. The team at Sesame launched an online demo version of its AI model on the company’s website, where you can chat with the AI as one of two personas—Miles or Maya. Both offer distinct voices for the AI, and both can respond in ways you won’t believe without hearing it yourself.

And so far, people are really taking to Sesame and its capabilities. We’ve already seen some amazing interactions between people and the AI—like an interaction between a Reddit user and the Miles voice, where the user tells the AI to act like a boss being confronted about a secret.

In the video, you can clearly hear how Sesame’s AI model responds quickly to what the user is saying, and while the poster did mention editing the piece down some, they mostly edited down some of their own fumbling, as well as the bit where they told the AI how to react.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

Others chimed in in the comments about how they tested it themselves, with one mentioning that they were able to get it to respond in quicker and even wittier fashion than the interaction showcased in the video. But that doesn’t downplay how crazy this interaction is on its own—or how promising (and terrifying) this technology is.

We’ve always known that AI voice models were going to be the most dangerous. But if Sesame is able to deliver such a realistic and believable voice presence in a demo like this, it’s hard to imagine what would be possible in a fully fleshed-out version of the model.

You can try out Sesame for yourself by heading over to the company’s website and choosing one of the two demo models available. Having tried it out myself, it’s remarkable how easily it can move between normal, intelligent conversation and more specified roleplay situations like those showcased by users on Reddit.

Many of us have been waiting for the moment that AI truly changes everything. While ChatGPT and other services have been promising, Sesame is probably the most promising opportunity I’ve ever personally experienced in the AI revolution, and I’m excited—and cautiously optimistic—about whats to come next.

Source

Posted on

Apple’s big AI-powered Siri upgrade was just delayed to 2026

The long-anticipated personalized Siri allegedly coming with iOS 18.4 has now been delayed to 2026. To Daring Fireball, Apple’s spokeswoman Jacqueline Roy said the more personalized Siri experience powered by Apple Intelligence will take longer to be released.

Here’s what she said: “Siri helps our users find what they need and get things done quickly, and in just the past six months, we’ve made Siri more conversational, introduced new features like type to Siri and product knowledge, and added an integration with ChatGPT. We’ve also been working on a more personalized Siri, giving it more awareness of your personal context, as well as the ability to take action for you within and across your apps. It’s going to take us longer than we thought to deliver on these features, and we anticipate rolling them out in the coming year.”

Bloomberg‘s Mark Gurman had already teased that some of the more personalized Siri features for Apple Intelligence could have been delayed. At the time, the journalist said that the most impressive functions could launch as soon as 2027.

In his Power On newsletter, he revealed that it’s going to take at least two extra years before Apple Intelligence gets somewhat similar to the capabilities OpenAI’s ChatGPT, Google’s Gemini, and Microsoft’s Copilot can deliver today—and, honestly, for at least a year now.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

According to the journalist, Apple has a long schedule to finally revamp Siri and make it an essential part of the Apple Intelligence platform. This is what you can expect:

  • iOS 18.4: Expected for early April, Apple is expanding the languages available with Apple Intelligence;
  • iOS 18.5: Expected for May, Gurman expected Apple to make Siri tap user data to make it more personalized, but this might have now been scrapped to 2026;
  • iOS 19.4: Expected around April-May of 2026, Siri is getting a new architecture that can operate legacy Siri commands while handling more advanced queries in the same flow;
  • iOS 20: Believe it or not, Gurman’s forecast goes up until 2027, when Apple might be finally able to fix Siri and deliver the LLM Siri, which was technically supposed to be revealed this June.

That said, Apple Intelligence will take much longer to become useful. With that in mind, we now wonder what Apple will do to improve its AI platform.

Source

Posted on

Latest WhatsApp beta introduces yet another useless AI feature

We already knew that Meta was planning to infuse more of its AI features into its apps—including WhatsApp. Well, it looks like Meta is finally starting to infuse more AI features into WhatsApp, and it’s starting with a pretty useless one.

Obviously, opinions on AI in WhatsApp have been very mixed since the company announced its plans. Some of our own have even questioned the move, especially since WhatsApp is meant to be end-to-end encrypted. But that doesn’t seem to have stopped Meta one bit, as Zuckerberg continues to push his idea of useful AI features down the collective throats of anyone using Meta’s apps.

According to reports, the latest beta for WhatsApp has officially brought more AI features into the messaging app. If you were expecting something overly useful, though, you might be disappointed, as it seems the “AI-powered” feature will only let you generate images for your chats—and only for group chats at that.

It’s a bit of a weird limitation, to be sure, and will likely be extended to other chats and even profile pictures before it’s all said and done. And while we might not be the biggest fan of Meta baking AI features into WhatsApp, others like ChatGPT have even started using WhatsApp as a way to interact with AI chatbots—and it might even be the best way to interact with ChatGPT.

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

I, personally, don’t find much use in image generation for profile icons and group chat icons. So, seeing a feature like this make the jump to WhatsApp isn’t exactly a huge deal. As my colleague Chris pointed out in the piece I linked at the start of this article, the influx of AI into an end-to-end encrypted messaging app certainly comes with some worrying possibilities.

Meta has yet to say whether it plans to extend the use of image generation beyond just group icons or if it will stop there for now, with no plans to bring it to other icons like profile pictures or regular group chat icons. However, it is likely that it will eventually be available for all these options at some point down the line, as it doesn’t make much sense to limit it to only group chats.

Considering Meta is already working to give AI bots prime access to WhatsApp users, it’s probably only a matter of time before we see more useless AI features like this making an appearance in the messaging app. Maybe it’s finally time to jump ship to another encrypted messaging app.

Source

Posted on

DeepSeek-R1: Budgeting challenges for on-premise deployments

Until now, IT leaders have needed to consider the cyber security risks posed by allowing users to access large language models (LLMs) like ChatGPT directly via the cloud. The alternative has been to use open source LLMs that can be hosted on-premise or accessed via a private cloud. 

The artificial intelligence (AI) model needs to run in-memory and, when using graphics processing units (GPUs) for AI acceleration, this means IT leaders need to consider the costs associated with purchasing banks of GPUs to build up enough memory to hold the entire model.

Nvidia’s high-end AI acceleration GPU, the H100, is configured with 80Gbytes of random-access memory (RAM), and its specification shows it’s rated at 350w in terms of energy use.

China’s DeepSeek has been able to demonstrate that its R1 LLM can rival US artificial intelligence without the need to resort to the latest GPU hardware. It does, however, benefit from GPU-based AI acceleration.

Nevertheless, deploying a private version of DeepSeek still requires significant hardware investment. To run the entire DeepSeek-R1 model, which has 671 billion parameters in-memory, requires 768Gbytes of memory. With Nvidia H100 GPUs, which are configured with 80GBytes of video memory card each, 10 would be required to ensure the entire DeepSeek-R1 model can run in-memory. 

IT leaders may well be able to negotiate volume discounts, but the cost of just the AI acceleration hardware to run DeepSeek is around $250,000.

Less powerful GPUs can be used, which may help to reduce this figure. But given current GPU prices, a server capable of running the complete 670 billion-parameter DeepSeek-R1 model in-memory is going to cost over $100,000.

The server could be run on public cloud infrastructure. Azure, for instance, offers access to the Nvidia H100 with 900 GBytes of memory for $27.167 per hour, which, on paper, should easily be able to run the 671 billion-parameter DeepSeek-R1 model entirely in-memory.

If this model is used every working day, and assuming a 35-hour week and four weeks a year of holidays and downtime, the annual Azure bill would be almost $46,000 a year. Again, this figure could be reduced significantly to $16.63 per hour ($23,000) per year if there is a three-year commitment.

Less powerful GPUs will clearly cost less, but it’s the memory costs that make these prohibitive. For instance, looking at current Google Cloud pricing, the Nvidia T4 GPU is priced at $0.35 per GPU per hour, and is available with up to four GPUs, giving a total of 64 Gbytes of memory for $1.40 per hour, and 12 would be needed to fit the DeepSeek-R1 671 billion-parameter model entirely-in memory, which works out at $16.80 per hour. With a three-year commitment, this figure comes down to $7.68, which works out at just under $13,000 per year.

A cheaper approach

IT leaders can reduce costs further by avoiding expensive GPUs altogether and relying entirely on general-purpose central processing units (CPUs). This setup is really only suitable when DeepSeek-R1 is used purely for AI inference.

A recent tweet from Matthew Carrigan, machine learning engineer at Hugging Face, suggests such a system could be built using two AMD Epyc server processors and 768 Gbytes of fast memory. The system he presented in a series of tweets could be put together for about $6,000.

Responding to comments on the setup, Carrigan said he is able to achieve a processing rate of six to eight tokens per second, depending on the specific processor and memory speed that is installed. It also depends on the length of the natural language query, but his tweet includes a video showing near-real-time querying of DeepSeek-R1 on the hardware he built based on the dual AMD Epyc setup and 768Gbytes of memory.

Carrigan acknowledges that GPUs will win on speed, but they are expensive. In his series of tweets, he points out that the amount of memory installed has a direct impact on performance. This is due to the way DeepSeek “remembers” previous queries to get to answers quicker. The technique is called Key-Value (KV) caching.

“In testing with longer contexts, the KV cache is actually bigger than I realised,” he said, and suggested that the hardware configuration would require 1TBytes of memory instead of 76Gbytes, when huge volumes of text or context is pasted into the DeepSeek-R1 query prompt.

Buying a prebuilt Dell, HPE or Lenovo server to do something similar is likely to be considerably more expensive, depending on the processor and memory configurations specified.

A different way to address memory costs

Among the approaches that can be taken to reduce memory costs is using multiple tiers of memory controlled by a custom chip. This is what California startup SambaNova has done using its SN40L Reconfigurable Dataflow Unit (RDU) and a proprietary dataflow architecture for three-tier memory.

“DeepSeek-R1 is one of the most advanced frontier AI models available, but its full potential has been limited by the inefficiency of GPUs,” said Rodrigo Liang, CEO of SambaNova.

The company, which was founded in 2017 by a group of ex-Sun/Oracle engineers and has an ongoing collaboration with Stanford University’s electrical engineering department, claims the RDU chip collapses the hardware requirements to run DeepSeek-R1 efficiently from 40 racks down to one rack configured with 16 RDUs.

Earlier this month at the Leap 2025 conference in Riyadh, SambaNova signed a deal to introduce Saudi Arabia’s first sovereign LLM-as-a-service cloud platform. Saud AlSheraihi, vice-president of digital solutions at Saudi Telecom Company, said: “This collaboration with SambaNova marks a significant milestone in our journey to empower Saudi enterprises with sovereign AI capabilities. By offering a secure and scalable inferencing-as-a-service platform, we are enabling organisations to unlock the full potential of their data while maintaining complete control.”

This deal with the Saudi Arabian telco provider illustrates how governments need to consider all options when building out sovereign AI capacity. DeepSeek demonstrated that there are alternative approaches that can be just as effective as the tried and tested method of deploying immense and costly arrays of GPUs.

And while it does indeed run better, when GPU-accelerated AI hardware is present, what SambaNova is claiming is that there is also an alternative way to achieve the same performance for running models like DeepSeek-R1 on-premise, in-memory, without the costs of having to acquire GPUs fitted with the memory the model needs.

Source

Posted on

DeepSeek is rushing to get its next-gen R2 model out sooner than expected

After taking the world by storm with the debut of its R1 reasoning model in January, Chinese AI startup DeepSeek is reportedly looking to maintain the momentum by rushing its new R2 model to market as quickly as possible, Reuters reports.

DeepSeek at first planned to launch R2 in early May, but sources familiar with the company tell Reuters that DeepSeek wants to speed up the schedule. However, the sources didn’t provide a new release date for DeepSeek-R2, which has yet to be announced.

We don’t know much about DeepSeek’s next AI model yet, but the Chinese company wants R2 to have improved coding skills and reason in languages other than English.

When DeepSeek-R1 launched, the entire industry was taken aback by the research paper that claimed the highly sophisticated model was trained at a fraction of the cost of OpenAI’s o1. The pushback was immediate, though, as OpenAI posited that DeepSeek distilled ChatGPT to train its model, and Google called DeepSeek’s claims “exaggerated.”

Tech. Entertainment. Science. Your inbox.

Sign up for the most interesting tech & entertainment news out there.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

Nevertheless, many companies were quick to adopt the new model, including OpenAI investor Microsoft, which added DeepSeek-R1 to Azure AI Foundry and GitHub. You can also find R1 in the Amazon Web Services (AWS) model catalog.

With the arrival of GPT-4.5 still weeks away and GPT-5 potentially months out, DeepSeek has a chance to shake up the market once again if R2 launches soon.

Source