Breaking
April 30, 2026

AI | How Traversal Prevents Million-Dollar Outages Christine Hall | usagoldmines.com

β€œIt’s like finding a needle in a haystack with fake needles everywhere.” – Anish Agarwal, co-founder and CEO of Traversal

Website outages are painful, but in the age of AI-generated code they’re turning existential. Last year, companies, including Amazon Web Services, Azure, Cloudflare and Google Cloud all announced major outages, some lasting over 15 hours.

As Traversal co-founder and CEO Anish Agarwal puts it, the oft-quoted β€œ$2 million an hour” figure during a downtime is now just a starting point, unfortunately, for large enterprises.Β 

β€œThe problem only gets bigger, the larger the company gets,” he said. β€œThe $2 million might even be small if we’re talking about some of the largest. I’m certain AWS’s recent outage was an order of orders of magnitude bigger than $2 million an hour.”

The stakes aren’t just abstract numbers on a slide. Agarwal has watched outages end careers. For example, Optus CEO Kelly Bayer Rosmarin resigned in 2023 following a 14-hour network outage, and more recently, IndiGo airline CEO Pieter Elbers resigned after an outage led to thousands of flight cancellations.

β€œCEOs are fired when they’re no longer hitting the agreements that they have contractually obligated to hit with customers,” Agarwal said. β€œOnce you don’t do that, it’s a security problem. You’re in breach of your contract, and that leads to massive fines and reputational damage.”

Why the old model broke

This isn’t a new problem, but the increase in AI use is pouring gasoline on an already burning fire, according to Agarwal.Β 

Even before generative AI, the amount of data produced from software was going up, yet the number of people who can troubleshoot well has been flat, Agarwal said. Why? Site reliability engineers (SREs) are scarce and budgets are capped even as observability has become β€œthe second largest spend, typically, for a company after cloud spend,” he said.

That means the status quo can look like a hospital emergency room on a bad night when something breaks in a large system.

β€œIt spreads like an epidemic throughout your entire system,” Agarwal said. This is because each team only understands its own part of the system, so connecting the dots between all these teams with limited context is painful, he said.Β 

In a pre-AI world, a major incident can mean 50 to 60 engineers in a β€œwar room” for hours troubleshooting while millions of dollars are wasted.

Now add AI-generated code. More organizations are under pressure β€œto apply AI to everything,” with one of the clearest returns on investment areas being software development via tools like Claude or Cursor, Agarwal said.

It also causes some CIOs to regret their decisions. AI company Dataiku polled 800 CIOs and found 74% of them were under pressure to β€œdeliver measurable business gains from AI within the next two years” or risk their jobs.Β 

That’s leading to some harried decision-making. The same percentage also β€œregret at least one major AI vendor or platform decision made in the last 18 months.”

The result of all that pressure is a ton of code being written by AI. And large enterprises also give AI systems permissions that they might typically not give so that they can see what the AI can do. This is known as β€œdangerously skip permissions,” a mode in Claude that bypasses the need for user approval before the AI performs an action.

The combination of more opaque code, more permissions and less human context means things are breaking in ways not seen before.

β€œNo one has context of the code, and the amount of code is blowing up as well,” Agarwal said. β€œSo the outages are getting way, way worse than they used to be, which was already really bad.”

From causal ML research to AI SRE

All of this became the thesis for Agarwal’s company, Traversal, which launches AI SREs to find the root cause of a network outage before engineers need the war room.Β 

Agarwal didn’t arrive at this problem as a traditional SaaS founder. His research while getting a Ph.D. at MIT and as a current professor at Columbia centered on a niche but powerful area: causal machine learning.Β 

β€œThese AI systems are very good at picking up minute correlations in data and not very good at picking up cause-and-effect relationships,” he said. β€œMy research was how do you get these AI systems to learn cause-and effect-relationships from data automatically?”

That turns out to be exactly what’s missing in today’s incident responses, and what Traversal is solving. In a complex distributed system, an outage looks like β€œfinding a needle in a haystack with fake needles everywhere,” Agarwal said.Β 

The hard question, according to him, is: β€œWhen you see an issue, is it a symptom of the problem? Is it just a spurious correlation because something else is wrong in the system, or is it the root cause?”

Agarwal joined with Ahmed Lone, Raaz Dwivedi and Raj Agrawal to research this, and says the light-bulb moment came when he and his co-founders connected that research to the reality of operations. They also played with early AI coding tools and saw the trajectory clearly.

β€œIf AI is going to write all of your code, and no one’s going to understand it, we need AI to fix your code as well,” Anish Agarwal said. β€œThat was really the key moment for us.”

He also felt that some of the most interesting work in AI was happening in companies now, and that a company β€œwith research in its DNA,” tackling a deeply technical problem. was the right expression.

Ending the 2 a.m. emergency calls

Traversal describes itself as an AI SRE agent that β€œautonomously troubleshoots, remediates and even prevents production incidents.” To understand what that means, Agarwal paints a before-and-after picture.

Before Traversal, Agarwal saw a lot of those β€œwar room” scenarios play out where an engineer gets paged at β€œungodly times of the day,” and joins an incident war room in Slack or Zoom to figure out what went wrong. Hours go by until there’s an β€œaha moment” and the team finally converges on a fix.Β 

β€œIt’s like this heart attack that an organization goes through every time a [critical] incident happens,” Agarwal said.

With Traversal, the workflow looks very different. For example, when there’s an incident, a ticket gets created, and Traversal automatically kicks off. By the time an engineer shows up, Traversal has come back with an answer, Agarwal said.Β 

Not only an answer, but tells the engineer who is needed to verify what Traversal has said. So instead of 50 people, five or six people are needed to verify the answer,” then execute the mitigating steps Traversal proposes, Agarwal said.

Rather than an average three hours, it becomes something like 15 minutes to get to the root cause of an incident and mitigate it,” he said.Β 

For some customers, Traversal has moved beyond recommendation into action. They have trusted the organization with autonomously healing their system without a human in the loop. Agarwal called this β€œself driving production,” where β€œTraversal finds the issue, tells you the mitigating steps, and then heals the system fully autonomously” without needing to get anyone up at 2 a.m.

Tangible ROI from AI

Over the last nine months, Agarwal has seen observability and reliability having a β€œChatGPT moment,” with enterprises actively seeking AI SRE solutions to keep increasingly AI-generated code stable in production.

Agarwal emphasizes that the product is now at a point where it can deliver fast, repeatable time-to-value β€” often within 30 days β€” by significantly reducing mean time to resolution.

As a result, Traversal is in go-to-strategy mode, growing the company by four times to over 70 people and turning on the sales engine after gaining clients, including American Express and Pepsi.

The company has moved so aggressively and hired so strategically that one of Agarwal’s friends commented that Traversal has created β€œthe Avengers of enterprise sales.”

In just a few months, Traversal has hired, among them, a vice president of worldwide sales, vice president of field engineering and vice president of marketing, all from blue-chip infrastructure and observability companies like AppDynamics, Cribl, SignalFx and Splunk, along with more than 10 sales executives and supporting solutions engineers.

In addition to securing more customers, Traversal’s vision extends well beyond incident response. The team is building what Agarwal calls a β€œproduction world model,” which is a rich representation of a company’s production environment analogous to the simulators used in self-driving cars.Β 

This world model doesn’t just power faster root-cause analysis; it can also be surfaced to AI coding tools to help them write more resilient code before it ever reaches production.

β€œThe market for this is massive, and if you start collecting all this data and correlating across all these disparate systems, you can really rethink all of the maintenance of software, and that’s the vision of where we’re going,” Agarwal said.

Β β€œIt’s like finding a needle in a haystack with fake needles everywhere.” – Anish Agarwal, co-founder and CEO of Traversal Website outages are painful, but in the age of AI-generated code they’re turning existential. Last year, companies, including Amazon Web Services, Azure, Cloudflare and Google Cloud all announced major outages, some lasting over 15 hours. As Traversal co-founder and CEO Anish Agarwal puts it, the oft-quoted β€œ$2 million an hour” figure during a downtime is now just a starting point,Β AI, Home, News, PopularΒ 

This articles is written by : Nermeen Nabil Khear Abdelmalak

All rights reserved to : USAGOLDMIES . www.usagoldmines.com

You can Enjoy surfing our website categories and read more content in many fields you may like .

Why USAGoldMines ?

USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.