Facebook: Outage caused by faulty routing configuration changes

Facebook says that yesterday’s worldwide outage was caused by faulty configuration changes made to its backbone routers that brought all its services to a halt.

“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication,” said Santosh Janardhan, VP for Engineering and Infrastructure at Facebook.

“This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”

The configuration issues also impacted the company’s internal systems and tools, making it harder to bring systems online and further hindered the recovery process.

“The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem,” Janardhan added.

He also said that there is no evidence that Facebook users’ data was compromised due to this downtime, with the company pinning the root cause behind this incident on a faulty configuration change.

To the huge community of people and businesses around the world who depend on us: we’re sorry. We’ve been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us.

— Facebook (@Facebook) October 4, 2021

What happened?

Yesterday, Facebook, Instagram, and WhatsApp started coming back online after the fix of a BGP routing issue that led to over six hours of downtime.

At approximately 11:50 AM EST, all three websites suddenly became unreachable, with browsers and apps displaying DNS errors on connection attempts.

While Facebook didn’t provide any details and the massive outage appeared to be DNS-related at first, it was later learned that the problem was far worse and a lot harder to fix.

Multiple Facebook routing prefixes suddenly disappeared from the Internet’s BGP routing tables, which immediately made it impossible to connect to any services hosted on those IP addresses, as Giorgio Bonfiglio, a Principal Technical Account Manager at Amazon AWS, explained.

BGP (Border Gateway Protocol) is a routing protocol that makes the Internet work and makes it possible for devices from one side of the world to devices on the other using routes (or prefixes.)

Since Facebook’s domain registrar and DNS servers are hosted on the company’s own routing prefix, when the BGP prefixes were removed from routing tables, no one could connect to their IP addresses or the services running on top of them.

“The BGP routes pointing traffic to Facebook’s IP address space have been withdrawn. The Internet no longer knows where to find Facebook’s IPs. One symptom is that DNS requests are failing,” said Johannes B. Ullrich, Ph.D., Dean of Research at the SANS Technology Institute.

“But this is just the result of Facebook hosting its DNS servers inside its own network. Even with working DNS (for example if you still have cached results), the IPs are currently not reachable.”

“To everyone who was affected by the outages on our platforms today: we’re sorry,” a Facebook spokesperson told BleepingComputer.

“We know billions of people and businesses around the world depend on our products and services to stay connected. We appreciate your patience as we come back online.”

Source: https://www.bleepingcomputer.com/news/technology/facebook-outage-caused-by-faulty-routing-configuration-changes/

Featured

Allianz and Oscar winner Christoph Waltz create new series to help people prepare for their financial future

Featured

Study: 70% of European banks and fintechs to increase investment in financial technology over the next 18 months despite the downturn

10% increase in NatWest customers using Cogo data to manage their carbon footprint

One-third of consumers believe payment companies aren’t able to help them tackle the cost-of-living crisis

Rising anxiety amongst homebuyers as banks fail to meet customer call needs

Featured

Featured

CDC Announces Important Advances in Protecting Americans from Heat

CDC warns of <em>Salmonella</em> Infections linked to Contaminated Basil sold at Trader Joe’s

Featured

The best free VPNs of 2024: Expert tested

Featured

How a new law protects your thoughts from tech companies – and why it matters

Brave search engine adds privacy-focused AI - no Google or Bing needed

Android could soon protect you from malicious apps by quarantining them

National Guard will use Google's AI for faster disaster response and recovery

Facebook: Outage caused by faulty routing configuration changes

What happened?

Featured

Allianz and Oscar winner Christoph Waltz create new series to help people prepare for their financial future

Featured

Study: 70% of European banks and fintechs to increase investment in financial technology over the next 18 months despite the downturn

Featured

Featured

CDC Announces Important Advances in Protecting Americans from Heat

Featured

The best free VPNs of 2024: Expert tested

Featured

How a new law protects your thoughts from tech companies – and why it matters

Facebook: Outage caused by faulty routing configuration changes

What happened?

Related Posts