Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

AI demand is so high, AWS customers are trying to buy out its entire capacity

The Amazon Web Services (AWS) chip business is “on fire,” Trainium offers better price-performance than Nvidia, and customers are so eager for AI compute capacity that they’re looking to buy up all that’s currently available.

These are the takeaways shared by Amazon CEO Andy Jassy in his eight page letter to shareholders in the tech giant’s 2025 annual report.

Jassy’s comments underscore how all-in enterprises are for AI, and Amazon’s ambitions to dominate a technology that, as he described it, will be as transformative as electricity.

Noted Scott Bickley, advisory fellow at Info-Tech Research Group, “pulling it all together, AWS is diving deeper to control the AI stack comprehensively through every layer: power, data center, custom silicon in the middle, and training and inference at the top.”

Big inference asks from customers

AWS added 3.9GW of new power capacity in 2025 and expects to double its total power capacity by the end of 2027, Jassy wrote to shareholders. “Yet we still have capacity constraints that yield unserved demand,” he said.

Notably, he revealed that two large customers are in such need of AI compute that they asked to buy all available 2026 instance capacity for AWS’ custom CPU chip, Graviton. He emphasized that AWS can’t agree to those kinds of requests, given other customer needs.

Matt Kimball, VP and principal analyst at Moor Insights & Strategy, noted, “two large customers asking to buy all of AWS’s Graviton capacity for 2026 says everything we need to know about where the market is.”

It’s not necessarily just a supply chain story, though, he said; it’s more of a “strategic dependency” story. Enterprises aren’t just shopping for compute, they’re trying to lock up capacity before a competitor does. “The risk for AWS isn’t failing to build fast enough. It’s more along the lines of constrained customers maybe hedging toward Azure or Google Cloud Platform (GCP),” he pointed out.

This also indicates how popular Graviton has become, and suggests that AWS might be struggling to meet demand. Rather than “lightweight chips supporting lightweight workloads,” Graviton is being used across workloads “with a variety of computational profiles,” said Kimball.

As they mature, Azure Cobalt and Google Cloud Axion processors will likely see the same kind of demand, which will make for an “interesting market dynamic” between Arm and x86 technologies, he said.

Info-Tech’s Bickley agreed that the impact of supply chain constraints is “broad and deep” in its effect on AI buildout. Even in the midst of reports that 50% of planned AI data center capacity will not materialize in 2026, “everything is sold out across the board.”

Trainium’s competitive edge

Going into 2026, Jassy described Amazon’s chip business as “on fire.” While AWS has a strong partnership with Nvidia and uses its semiconductors, there is what he called a “new shift” in the processor landscape as customers seek out better price-performance.

Notably, Amazon released the second generation of its custom AI silicon, Trainium2, in late 2024, and Bedrock now runs most of its inference on these next-generation accelerators. Jassy claimed Trainium2 offers roughly 30% better price-performance than comparable GPUs, and is “largely sold out.”

Meanwhile, Trainium3, which just began shipping, is 30% to 40% more price/performant than Trainium2, and is already “nearly fully-subscribed,” he said. Further, a significant chunk of Trainium4 capacity, which is still about 18 months from broad availability, has been reserved.

“There’s so much demand for our chips that it’s quite possible we’ll sell racks of them to third parties in the future,” Jassy said.

Info-Tech’s Bickley pointed out that Amazon is not necessarily trying to eliminate Nvidia so much as reduce its dependence on the chip leader’s technology in areas “where AWS can win on economics.”

While AWS remains a strong Nvidia partner, it can provide a differentiated value proposition based on price-performance, he said. AWS brings a “holistic package” via tight integration with Bedrock, AWS-designed interconnects, more efficient token economics, and a software stack built on standard PyTorch/JAX/vLLM workflows.

Trainium’s prime use cases are training and inference for large language models (LLMs), multimodal models, and diffusion transformers in the hundreds of billions to trillion-plus parameter range, Bickley explained.

Marquee names like Anthropic and Uber are “putting AWS’s efficiency claims to the test,”  he noted; on the other hand, customers like Cohere and Stability AI prefer Nvidia’s mature tooling framework and “superior chip designs,” citing AWS service and availability issues.

Moor’s Kimball pointed out that another factor to consider is AWS’ partnership with Cerebras. Trainium is optimized for prefill and Cerebras CS-3 is optimized for decode, allowing the two to deliver what they claim is the best inference performance with no user intervention required. “This is the kind of ‘point-and-click’ simplicity enterprise users are looking for,” he said.

Ultimately, Jassy is drawing a direct line from what Graviton did to x86 to what Trainium is doing to Nvidia, he said. Inference is the “fastest-growing and most cost-sensitive workload in enterprise AI, and that’s exactly where Trainium is gaining the most ground.”

Learning from the Mantle scale-up

Jassy also emphasized the importance of being able to go back to the starting line to “redirect the trajectory.” For instance, Amazon Bedrock was built rapidly and scaled “faster than expected,” and the team realized it required a whole different type of inference engine, not just a tweak.

The Bedrock team quickly spun up a group of six “very skilled engineers” using AWS’ agentic coding service, Kiro, to deliver a new engine, Mantle, in 76 days. Mantle has since become the backbone of Bedrock, which processed more tokens in Q1 2026, Jassy claimed, than had been processed in all prior years combined.

The ability for a small team to accomplish such a large rebuild in such a short time frame, alongside adding features such as stateful conversation management, asynchronous inference, and higher default quotas, among others, is “impressive at first blush,” noted Info-Tech’s Bickley.

“The takeaway is that Mantle should be considered a key product for inference in its own right,” he said. And a separate AWS engineering post seeks to add confidence in the model’s security and governance considerations, Bickley explained.

Moor’s Kimball called the genesis of Mantle “really two stories.” One is operational (Bedrock needed a new architecture); the other is productivity compression.

“If six engineers with agentic tools can do what 40 couldn’t have done faster, the calculus on team size, project timelines, and build-vs-buy decisions shifts fundamentally,” he said. “The token volume numbers make the outcome clear and compelling.”

But Mantle isn’t just a rebuild, it’s yet another proof point that AI-assisted development is changing what’s possible. “Not just in theory or some marketing slogan,” Kimball said, “but in production.”

Jassy noted, “progress will not be linear. There will be moments of acceleration and moments where we adjust course. We will experiment, invest disproportionately behind what matters, and pull back when something isn’t working.”

This article originally appeared on NetworkWorld.


Read More from This Article: AI demand is so high, AWS customers are trying to buy out its entire capacity
Source: News

Category: NewsApril 11, 2026
Tags: art

Post navigation

PreviousPrevious post:KPMG report finds enterprise disconnect between AI and its ROINextNext post:Google adds end-to-end Gmail encryption to Android, iOS devices for enterprises

Related posts

Data centers are costing local governments billions
April 17, 2026
Robot Zuckerberg shows how IT can free up CEOs’ time
April 17, 2026
UK wants to build sovereign AI — with just 0.08% of OpenAI’s market cap
April 17, 2026
Oracle delivers semantic search without LLMs
April 17, 2026
Secure-by-design: 3 principles to safely scale agentic AI
April 17, 2026
No sólo IA marca la transformación digital de los sectores clave
April 17, 2026
Recent Posts
  • Data centers are costing local governments billions
  • Robot Zuckerberg shows how IT can free up CEOs’ time
  • UK wants to build sovereign AI — with just 0.08% of OpenAI’s market cap
  • Oracle delivers semantic search without LLMs
  • Secure-by-design: 3 principles to safely scale agentic AI
Recent Comments
    Archives
    • April 2026
    • March 2026
    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.